Question

我有大量文件夹，其中包含每个文件夹下的csv和htm文件（某些文件夹有多个csv文件，有些文件夹只有一个csv文件）。

是否可以自动筛选并获取仅包含一个csv文件的文件夹，并将数据导入R或其他统计包中？

Answer 1

getwd()
all_files<-list.files()
split_all_files<-sapply(all_files,function(x) strsplit(x, "\\.")[1])

for(i in seq(1,length(all_files))){

  if(split_all_files[[i]][2]=="csv"){
    data_file<-data.frame()
    data_file<-read.csv(all_files[i])

   }  
 }

Answer 2

OP已请求在所有目录中搜索csv个文件，但仅考虑那些包含恰好一个 csv文件的目录。只应导入这些文件。

在UNIX系统上，有一些操作系统命令如fgrep可能可用于此目的，但我相信下面的基本R解决方案应该适用于任何系统：

# define starting dir
path <- file.path("path", "to", "start", "search")
# or path <- file.path(".")
# or path <- getwd()

# find all directories, recursively, i.e., also sub-directories
dirs <- list.dirs(path, recursive = TRUE)

# search all directories for csv files, i.e., file name is ending with csv
# return result as a list with a vector of file names per list element
csv_files <- lapply(dirs, list.files, pattern = "\\.csv$", full.names = TRUE)

# pick only those list elements which contain exactly one .csv file
# and unlist to get vector of file names.
# note lenghts() gets the length of each element of a list
files_to_read <- unlist(csv_files[lengths(csv_files) == 1L])

# read selected files, return result in a list
imported <- lapply(files_to_read, data.table::fread)
# or use a different file reader, alternatively
imported <- lapply(files_to_read, readr::read_csv)

# name list elements to identify imported data sets
setNames(imported) <- files_to_read
# or use only the file name
setNames(imported) <- basename(files_to_read)
# or use only the name of the enclosing directory
setNames(imported) <- basename(dirname(files_to_read))

如何在大量文件夹中选择仅包含一个CSV文件的文件夹？

2 个答案: