我有大量文件夹,其中包含每个文件夹下的csv和htm文件(某些文件夹有多个csv文件,有些文件夹只有一个csv文件)。
是否可以自动筛选并获取仅包含一个csv文件的文件夹,并将数据导入R或其他统计包中?
答案 0 :(得分:0)
getwd()
all_files<-list.files()
split_all_files<-sapply(all_files,function(x) strsplit(x, "\\.")[1])
for(i in seq(1,length(all_files))){
if(split_all_files[[i]][2]=="csv"){
data_file<-data.frame()
data_file<-read.csv(all_files[i])
}
}
答案 1 :(得分:0)
OP已请求在所有目录中搜索csv
个文件,但仅考虑那些包含恰好一个 csv
文件的目录。只应导入这些文件。
在UNIX系统上,有一些操作系统命令如fgrep
可能可用于此目的,但我相信下面的基本R解决方案应该适用于任何系统:
# define starting dir
path <- file.path("path", "to", "start", "search")
# or path <- file.path(".")
# or path <- getwd()
# find all directories, recursively, i.e., also sub-directories
dirs <- list.dirs(path, recursive = TRUE)
# search all directories for csv files, i.e., file name is ending with csv
# return result as a list with a vector of file names per list element
csv_files <- lapply(dirs, list.files, pattern = "\\.csv$", full.names = TRUE)
# pick only those list elements which contain exactly one .csv file
# and unlist to get vector of file names.
# note lenghts() gets the length of each element of a list
files_to_read <- unlist(csv_files[lengths(csv_files) == 1L])
# read selected files, return result in a list
imported <- lapply(files_to_read, data.table::fread)
# or use a different file reader, alternatively
imported <- lapply(files_to_read, readr::read_csv)
# name list elements to identify imported data sets
setNames(imported) <- files_to_read
# or use only the file name
setNames(imported) <- basename(files_to_read)
# or use only the name of the enclosing directory
setNames(imported) <- basename(dirname(files_to_read))