我应该将我的硕士论文的数据加载到R数据框中,该数据存储在74个excel工作簿中。每个工作簿都有4个工作表,分别称为:动物,特征,r_words,动词。所有工作表都具有相同的12个变量(开始时间,单词,结束时间,ID等)。我想将之前的每个工作表连接起来,因此生成的数据框应具有12列,并且行数取决于74个主题产生的答案数量。 我想使用tidyverse的readxl软件包,并遵循本文:https://readxl.tidyverse.org/articles/articles/readxl-workflows.html#csv-caching-and-iterating-over-sheets。 我面临的第一个问题是如何使用read_excel(path,sheet =“ animals”,“ features”,“ r_words”,“ verbs”)阅读所有4个工作表。这仅适用于第一个工作表,因此我尝试列出所有工作表名称(对象工作表)的列表。这也不起作用。当我尝试将以下代码仅用于一个工作表时,下一行将引发错误: basename(。)中的错误:预期为字符向量参数 因此,这是我的代码的一部分,希望可以满足要求:
filenames <- list.files("data", pattern = '\\.xlsm',full.names = TRUE)
# indices
subfile_nos <- 1:length(filenames)
# function to read all the sheets in at once and cache to csv
read_then_csv <- function(sheet, path) {
for (i in 1:length(filenames)){
sheet <- excel_sheets(filenames[i])
len.sheet <- 1:length(sheet)
path <- read_excel(filenames[i], sheet = sheet[i]) #only reading in the first sheet
pathbase <- path %>%
basename() %>% #Error in basename(.) : a character vector argument expected
tools::file_path_sans_ext()
path %>%
read_excel(sheet = sheet) %>%
write_csv(paste0(pathbase, "-", sheet, ".csv"))
}
}
答案 0 :(得分:1)
您应该执行双循环或嵌套地图,如下所示:
library(dplyr)
library(purrr)
library(readxl)
# I suggest looking at
?purrr::map_df
# Function to read all the sheets in at once and save as csv
read_then_csv <- function(input_filenames, output_file) {
# Iterate over files and concatenate results
map_df(input_filenames, function(f){
# Iterate over sheets and concatenate results
excel_sheets(f) %>%
map_df(function(sh){
read_excel(f, sh)
})
}) %>%
# Write csv
write_csv(output_file)
}
# Test function
filenames <- list.files("data", pattern = '\\.xlsm',full.names = TRUE)
read_then_csv(filenames, 'my_output.csv')