Question

我需要使用R读取存储在多个目录中的特定csv文件。每个目录都包含这些文件（和其他文件），但是这些文件以不同的名称列出，但具有可识别的不同字符。

假设我要读取的csv文件包含以下不同字符：“ 1”（文件1）和“ 2”（文件2）。

这是我到目前为止尝试过的代码：

# This is the main directory where all your the sub-dir with files are stored
common_path = "~/my/main/directory"

# Extract the names of the sub-dir
primary_dirs = list.files(common_path) 

# Create empty list of lists
data_lst = rep(list(list()), length(primary_dirs)) # one list per each directory

# These are the 2 files (by code) that I need to read
names_csv = c('1', '2')

#### Nested for loop reading the csv files into the list of lists
for (i in 1:length(primary_dirs)) {

    for (j in 1:length(names_csv)) {

    data_lst[[i]][j] = read.csv(paste('~/my/main/directory/', primary_dirs[i],  
                                      '/name_file', names_csv[j],  '.csv', sep = ''))

    }
}
### End of nested loop

这里的问题是，仅当每个目录中文件的名称相同时，代码才起作用。但这种情况并非如此。每个目录都有不同的文件名，但是文件名包含不同的字符“ 1”和“ 2”。

例如在这种情况下，我在所有目录中的文件都称为“ name_file1.csv”和“ name_file2.csv”。但是在我的实际情况下，文件名类似于：dir 1->'name_bla_1.csv'，'name_bla_2.csv';目录2->'name_gya_1.csv''name_gya_2.csv';等等...

如何从我所有目录中读取这两个文件，并且文件名称不同？

谢谢

Answer 1

您使事情变得太复杂了。 list.files可以递归搜索（在目录中），可以返回完整的文件路径，因此您不必担心paste一起在文件路径中查找，并且可以匹配正则表达式pattern。

files_to_read = list.files(
  path = common_path,        # directory to search within
  pattern = ".*(1|2).*csv$", # regex pattern, some explanation below
  recursive = TRUE,          # search subdirectories
  full.names = TRUE          # return the full path
)
data_lst = lapply(files_to_read, read.csv)  # read all the matching files

要了解有关正则表达式的更多信息，我建议regex101.com。 .*，(1|2)匹配1或2，并且$匹配字符串的结尾，因此".*(1|2).*csv$"将匹配所有包含1或2并以csv结尾的字符串。

Answer 2

如果您只想从任何子目录中读取任何匹配的文件名，则可以尝试以下操作：

regular_expression <- "name_[A-z]+_"
names_csv <- c('1', '2')
names_to_read <- paste0(regular_expression, names_csv, "\\.csv", collapse = "|")
fileList <- list.files(pattern = names_to_read, path = common_path, 
                       recursive = TRUE, full.names = TRUE)    
data_lst <- lapply(files_to_read, function(x) read.csv(x))

输出应该是一个列表，其中每个条目都是您的csv文件之一。

我不清楚您是否要根据读取每个文件的目录来保持分隔，因此我没有提供。

使用R

2 个答案: