我开始转圈了。我觉得自己已经在网上进行了彻底的搜索,但是怀疑在几天后回到这个问题之后,我现在看不到树木的树木。
我希望从公司SharePoint上的数千个excel文件中抓取多组数据。我已经能够使用readxl成功抓取。
library(readxl)
library(data.table)
library(XLConnect)
root_URL <- '//companyname.office.abc.com/sites/thesite/thefolder')
folder.list <- list.dirs(root_URL)
file.list <- list.files(folder.list, pattern = "*.(xlsx|XLSX|xls|XLS|xlsm|XLSM|xlsb|XLSB)$",full.names = T,include.dirs = T)
这会产生一个不错的列表,其中列出了我可能需要从中抓取的所有文件。我已经使用以下代码从列表中第3、4和5个文件的特定选项卡(“地址”)中成功提取了所需的数据。
ex.list <- file.list[3:5]
ex.list <- setNames(ex.list, ex.list)
df.list <- lapply(ex.list, read_excel, sheet = 'Address' )
df.list <- Map(function(df, name) {
df$source_name <- name
df
}, df.list, names(df.list))
df <- rbindlist(df.list, idcol = "id")
write.csv(df,"testdata1.csv")
我遇到的问题是第一,第二(和其他文件)没有名为“地址”的选项卡,我需要从我的file.list中排除这些文件,但是因为这是字符向量的列表,当文件不包含名为“地址”的标签时,'m努力筛选要排除的列表
我用lappy取得了以下结果,甚至尝试了sapply(也已共享),但是现在正努力编写条件语句。感觉很近但是很远。
> aa <- lapply(ex.list, excel_sheets)
> aa
[[1]]
[1] "NODE SIDE A" "NODE SIDE B" "LMA" "BASE" "TUBE" "Notes"
[[2]]
[1] "NODE SIDE A" "LMA" "BASE" "TUBE" "Notes"
[[3]]
[1] "Equipment-Details" "Address" "Drop Down Values" "Validation Status" "EquipMaster"
[[4]]
[1] "Equipment-Details" "Address" "Drop Down Values" "Validation Status" "EquipMaster"
[[5]]
[1] "Equipment-Details" "Address" "Drop Down Values" "Validation Status" "EquipMaster"
> bb <- sapply(ex.list, excel_sheets)
> bb
$'//companyname.office.abc.com/sites/thesite/thefolder/subfolder/file1.xls`
[1] "NODE SIDE A" "NODE SIDE B" "LMA" "BASE" "TUBE" "Notes"
$`//companyname.office.abc.com/sites/thesite/thefolder/subfolder/file2.xls`
[1] "NODE SIDE A" "LMA" "BASE" "TUBE" "Notes"
$`//companyname.office.abc.com/sites/thesite/thefolder/subfolder/file3.xls`
[1] "Equipment-Details" "Address" "Drop Down Values" "Validation Status" "EquipMaster"
$`//companyname.office.abc.com/sites/thesite/thefolder/subfolder/file4.xls`
[1] "Equipment-Details" "Address" "Drop Down Values" "Validation Status" "EquipMaster"
$`//companyname.office.abc.com/sites/thesite/thefolder/subfolder/file5.xls`
[1] "Equipment-Details" "Address" "Drop Down Values" "Validation Status" "EquipMaster"
答案 0 :(得分:1)
我认为这应该可行:
library(readxl)
df.list <- lapply(ex.list, function(x)
if ("Address" %in% excel_sheets(x)) read_excel(x,sheet = 'Address')
else NULL)
答案 1 :(得分:0)
读入所有文件后,您可以使用
过滤列表aa <- list(c("A", "B", "C"),
c("A", "B", "Address"),
c("A", "B", "Address"),
c("A", "B", "C"))
aa[grep(pattern = "Address", aa)]