如何在大量文件夹中选择仅包含一个CSV文件的文件夹?

时间:2017-06-28 14:07:19

标签: r csv import directory

我有大量文件夹,其中包含每个文件夹下的csv和htm文件(某些文件夹有多个csv文件,有些文件夹只有一个csv文件)。

是否可以自动筛选并获取仅包含一个csv文件的文件夹,并将数据导入R或其他统计包中?

2 个答案:

答案 0 :(得分:0)

getwd()
all_files<-list.files()
split_all_files<-sapply(all_files,function(x) strsplit(x, "\\.")[1])

for(i in seq(1,length(all_files))){

  if(split_all_files[[i]][2]=="csv"){
    data_file<-data.frame()
    data_file<-read.csv(all_files[i])

   }  
 }

答案 1 :(得分:0)

OP已请求在所有目录中搜索csv个文件,但仅考虑那些包含恰好一个 csv文件的目录。只应导入这些文件。

在UNIX系统上,有一些操作系统命令如fgrep可能可用于此目的,但我相信下面的基本R解决方案应该适用于任何系统:

# define starting dir
path <- file.path("path", "to", "start", "search")
# or path <- file.path(".")
# or path <- getwd()

# find all directories, recursively, i.e., also sub-directories
dirs <- list.dirs(path, recursive = TRUE)

# search all directories for csv files, i.e., file name is ending with csv
# return result as a list with a vector of file names per list element
csv_files <- lapply(dirs, list.files, pattern = "\\.csv$", full.names = TRUE)

# pick only those list elements which contain exactly one .csv file
# and unlist to get vector of file names.
# note lenghts() gets the length of each element of a list
files_to_read <- unlist(csv_files[lengths(csv_files) == 1L])

# read selected files, return result in a list
imported <- lapply(files_to_read, data.table::fread)
# or use a different file reader, alternatively
imported <- lapply(files_to_read, readr::read_csv)

# name list elements to identify imported data sets
setNames(imported) <- files_to_read
# or use only the file name
setNames(imported) <- basename(files_to_read)
# or use only the name of the enclosing directory
setNames(imported) <- basename(dirname(files_to_read))