Question

我需要从excel文件列表中阅读特定的图纸。我有> 500个excel文件，".xls"和".xlsx"。每个文件可以具有不同的工作表，但是我只想读取包含特定表达式的每个工作表，例如pattern = "^Abc"，并且并非所有文件都具有这种模式的工作表。我已经创建了读取一个文件的代码，但是当我尝试转换为多个文件时，总会返回错误。

# example with 3rd file
# 2 sheets have the pattern

list_excels <- list.files(path = "path_to_folder", pattern = ".xls*"
sheet_names <- excel_sheets(list_excels[[3]])
list_sheets <- lapply(excel_sheets(list_excels[[3]]), read_excel, path = list_excels[[3]])
names(list_sheets) <- sheet_names
do.call("rbind", list_sheets[grepl(pattern = "^Abc", sheet names)])

但是，当我尝试编写代码以读取多个Excel文件时，出现错误或循环中的某些内容会减慢计算速度。

有一些例子

这是一个不会返回错误的循环，但是对于列表的每个元素至少要花费30秒，我从未等待完成。

for (i in seq_along(list_excels)) { 
sheet_names <- excel_sheets(list_excels[[i]]) 
list_sheets <- lapply(excel_sheets(list_excels[[i]]), read_excel, path = list_excels[[i]]) 
names(list_sheets) <- sheet_names[i] list_sheets[grepl(pattern = "^Abc", sheet_names)]
}

在此循环中缺少最后一部分，即具有此代码的合并工作表

list_sheets[grepl(pattern = "^Abc", sheet_names)]

我试图将每张纸上的行相加并将其存储在矢量中，但是我认为当有一张纸上没有图案时，循环就会中断。

x <- c()
for(i in seq_along(list_excels)) {
  x[i] <- nrow(do.call("rbind",
                       lapply(excel_sheets(list_excels[[i]]),
                              read_excel,
                              path = list_excels[[i]])[grepl(pattern = "^Abc",
                                                          excel_sheets(list_excels[[i]]))]))

同样使用purrr库，尝试读取所有内容，并与第一个循环示例相同。

list_test <- list()
for(i in seq_along(list_excels)) {
  list_test[[i]] <- excel_sheets(list_excels[[i]]) %>%
                              set_names() %>%
                              map(read_excel, path = list_excels[[i]])
}

最后一个示例，适用于一个excel文件，但不适用于多个excel文件。只是阅读命名表。

# One file works
    data.frame(readWorksheetFromFile(list_excels[[1]], sheet = "Abc"))
#Multiple file returns an error
for(i in seq_along(list_excels)) {
  data.frame(readWorksheetFromFile(list_excels[[i]], sheet = "Abc"))
#Returns the following error
#Error: IllegalArgumentException (Java): Sheet index (-1) is out of range (0..1)

有人可以帮助我吗？

从Excel文件列表中读取特定的工作表

0 个答案: