我使用了包' GDELTtools'从GDELT下载数据。现在,下载了数据,但是没有变量存储在全局环境中。我想将数据存储到数据帧变量中,以便我可以对其进行分析。
该文件夹包含30多个压缩文件。每个压缩文件都包含一个csv。我需要将所有这些csv存储在r的全局环境中的一个变量中。我希望能做到这一点。
提前谢谢!
答案 0 :(得分:0)
暂时没有写R,所以我会尽我所能。
仔细阅读评论 ,因为他们会解释程序。
我会附上检查以下信息的链接:unzip,readCSV,mergeDataFrames,emptyDataFrame,concatinateStrings
根据GDELTtools的docs,您可以通过提供 local.folder =“〜/ gdeltdata”作为GetGDELT()函数的参数,轻松指定下载文件夹。
之后,您可以 list.files(“path / to / files / directory”)函数获取下面的解释代码中使用的文件名向量。查看docs以获取更多示例和说明。
# set path to of unzip output
outDir <-"C:\\Users\\Name\\Documents\\unzipfolder"
# relative path where zip files are stored
relativePath <- "C:\\path\\to\\my\\directory\\"
# create varible to store all the paths to the zip files in a vector
zipPaths <- vector()
# since we have 30 files we should iterate through
# I assume you have a vector with file names in the variable fileNames
for (name in fileNamesZip) {
# Not sure if it will work but use paste() to concat strings
zipfilepath <- paste0(relativePath, name, ".zip")
# append filepath
append(zipPaths, zipfilepath)
}
# now we have a vector which contains all the paths to zip files
# use unzip() function and pass zipPaths to it. (Read official docs)
unzip(files=zipPaths, exdir=outDir)
# initialize dataframe for all the data. You must provide datatypes for the columns.
total <- data.frame=(Doubles=double(),
Ints=integer(),
Factors=factor(),
Logicals=logical(),
Characters=character(),
stringsAsFactors=FALSE)
# now its time to store data by reading csv files and storing them into dataframe.
# again, I assume you have a vector with file names in the variable fileNames
for (name in fileNamesCSV) {
# create the csv file path
csvfilepath <- paste0(outDir, name, ".csv")
# read data from csv file and store in in a dataframe
dataFrame = read.csv(file=csvfilepath, header=TRUE, sep=",")
# you will be able to merge dataframes only if they are equal in structure. Specify the column names to merge by.
total <- merge(data total, data dataFrame, by=c("Name1","Name2"))
}
答案 1 :(得分:0)
可能更简单的东西:
list.files()
列出目录中的文件readr::read_csv()
会根据需要自动解压缩文件dplyr::bind_rows()
将合并数据框所以试试:
lf <- list.files(pattern="\\.zip")
dfs <- lapply(lf,readr::read_csv)
result <- dplyr::bind_rows(dfs)