如何将包含超过30个压缩文件的文件夹存储到r中的变量中

时间:2018-02-02 08:32:13

标签: r csv zip zipfile

我使用了包' GDELTtools'从GDELT下载数据。现在,下载了数据,但是没有变量存储在全局环境中。我想将数据存储到数据帧变量中,以便我可以对其进行分析。

该文件夹包含30多个压缩文件。每个压缩文件都包含一个csv。我需要将所有这些csv存储在r的全局环境中的一个变量中。我希望能做到这一点。

提前谢谢!

2 个答案:

答案 0 :(得分:0)

暂时没有写R,所以我会尽我所能。

仔细阅读评论 ,因为他们会解释程序

我会附上检查以下信息的链接:unzipreadCSVmergeDataFramesemptyDataFrameconcatinateStrings

根据GDELTtools的docs,您可以通过提供 local.folder =“〜/ gdeltdata”作为GetGDELT()函数的参数,轻松指定下载文件夹。

之后,您可以 list.files(“path / to / files / directory”)函数获取下面的解释代码中使用的文件名向量。查看docs以获取更多示例和说明。

# set path to of unzip output
outDir <-"C:\\Users\\Name\\Documents\\unzipfolder"
# relative path where zip files are stored
relativePath <- "C:\\path\\to\\my\\directory\\"
# create varible to store all the paths to the zip files in a vector
zipPaths <- vector()
# since we have 30 files we should iterate through
# I assume you have a vector with file names in the variable fileNames
for (name in fileNamesZip) {
  # Not sure if it will work but use paste() to concat strings
  zipfilepath <- paste0(relativePath, name, ".zip")
  # append filepath
  append(zipPaths, zipfilepath)
}
# now we have a vector which contains all the paths to zip files
# use unzip() function and pass zipPaths to it. (Read official docs)
unzip(files=zipPaths, exdir=outDir)
# initialize dataframe for all the data. You must provide datatypes for the columns.
total <- data.frame=(Doubles=double(),
             Ints=integer(),
             Factors=factor(),
             Logicals=logical(),
             Characters=character(),
             stringsAsFactors=FALSE)
# now its time to store data by reading csv files and storing them into dataframe.
# again, I assume you have a vector with file names in the variable fileNames
for (name in fileNamesCSV) {
  # create the csv file path 
  csvfilepath <- paste0(outDir, name, ".csv")
  # read data from csv file and store in in a dataframe
  dataFrame = read.csv(file=csvfilepath, header=TRUE, sep=",")
  # you will be able to merge dataframes only if they are equal in structure. Specify the column names to merge by.
  total <- merge(data total, data dataFrame, by=c("Name1","Name2"))
}

答案 1 :(得分:0)

可能更简单的东西:

  1. list.files()列出目录中的文件
  2. readr::read_csv()会根据需要自动解压缩文件
  3. dplyr::bind_rows()将合并数据框
  4. 所以试试:

    lf <- list.files(pattern="\\.zip")
    dfs <- lapply(lf,readr::read_csv)
    result <- dplyr::bind_rows(dfs)