我一直在使用XLConnect
函数loadworkbook
将每个xlsx文件加载到R然后rbind
以将它们合并在一起。这样做的最佳方法是什么,而不是编写多个df以便稍后合并它们。我正在尝试使用下面的代码将我的excel文件合并为2个数据框(大多数文件的2个工作表名称)。列始终相同,但文件名将更改。
当前/慢速
require(XLConnect)
df <- loadWorkbook(paste(location,'UK.xlsx',sep=""))
dfb <- loadWorkbook(paste(location,'US.xlsx',sep=""))
UK <-readWorksheet(df,sheet="School",startRow=0,startCol=0,autofitRow=TRUE,endCol=21,header=TRUE)
US <-readWorksheet(dfb,sheet="School",startRow=0,startCol=0,autofitRow=TRUE,endCol=21,header=TRUE)
School <- rbind(UK,US)
UK <-readWorksheet(df,sheet="College",startRow=0,startCol=0,autofitRow=TRUE,endCol=21,header=TRUE)
US <-readWorksheet(dfb,sheet="College",startRow=0,startCol=0,autofitRow=TRUE,endCol=21,header=TRUE)
College <- rbind(UK,US)
新代码
require(readxl)
filelist<- list.files(location,pattern='xlsx',full.names = T)
如果并非每个文件都有两个工作表名,那么如何将每个工作表名称读入数据框。我需要2个数据帧1用于学校,1个用于学院。
我想我需要尝试类似Schools <-lapply(filelist, read_excel, sheet="School")
的内容,但我得到错误:工作表&#39;学校&#39;未找到。我认为这个错误是因为Sheet School不在每个文件上。我正在使用list.files
,因为文件名并不总是相同。
答案 0 :(得分:1)
这种做法怎么样?
library(purrr)
library(readxl)
# filenames to xl-sheets
files <- sprintf("Mappe%i.xlsx", 1:3)
# read only df for xl-files with school-sheet
xl_school <- map_if(files, ~ "School" %in% excel_sheets(.x), ~read_excel(.x))
# read only df for xl-files with college-sheet
xl_college <- map_if(files, ~ "College" %in% excel_sheets(.x), ~read_excel(.x))
# combine school-files to data frame (repeat same for college)
school_df <- map_df(xl_school, function(x) if(is.data.frame(x)) x)
school_df
#> # A tibble: 3 × 1
#> Test
#> <chr>
#> 1 fdsf
#> 2 543534
#> 3 gfdgfdd
您可能需要强制列类型为文本。只需将col_types = "text"
添加到read_excel()
- 致电:
# read only df for xl-files with school-sheet
xl_school <- map_if(files, ~ "School" %in% excel_sheets(.x), ~read_excel(.x, col_types = "text"))
# read only df for xl-files with college-sheet
xl_college <- map_if(files, ~ "College" %in% excel_sheets(.x), ~read_excel(.x, col_types = "text"))