r-如何将具有多个工作表的多个工作簿中的数据读取到R中?

时间:2019-01-02 22:25:06

标签: r

我正在尝试从使用 多个工作表 多个工作簿 中读取数据。有10个工作簿,每个工作簿都有两个工作表中的数据。

以下代码可用于从第一张纸中提取数据。但是,我也想将数据提取到同一工作簿中的另一张纸上。我不确定如何在以下代码中指定工作表名称。

library(purrr)
library(readxl)
library(dplyr)
library(tidyr)

data_path <- "C:/Desktop/Test"

files <- dir(data_path, pattern = "*.xlsx")


weights_data <- data.frame(filename = files) %>%
               mutate(file_contents = map(filename,
                                                 ~ read_excel(file.path
                                                              (data_path,  .))))

View(unnest(weights_Data))

1 个答案:

答案 0 :(得分:1)

read_excel带有另一个参数,可让您指定特定的工作表:

sheet: Sheet to read. Either a string (the name of a sheet), or an
       integer (the position of the sheet). Ignored if the sheet is
       specified via 'range'. If neither argument specifies the
       sheet, defaults to the first sheet.

因此,我们需要扩展路径框架以包括工作表,只需使用readxl::excel_sheets即可轻松完成,library(tibble) library(dplyr) library(tidyr) library(purrr) library(readxl) data_frame( path = list.files(path = "~/StackOverflow/Prah/", pattern = "*.xlsx", full.names = TRUE) ) %>% mutate(sheets = map(path, excel_sheets)) # # A tibble: 3 x 2 # path sheets # <chr> <list> # 1 "C:\\Users\\r2/StackOverflow/Prah/mt1.xlsx" <chr [2]> # 2 "C:\\Users\\r2/StackOverflow/Prah/mt2.xlsx" <chr [2]> # 3 "C:\\Users\\r2/StackOverflow/Prah/mt3.xlsx" <chr [2]> 对于单个路径将返回工作表名称的向量。

反复进行讨论/演练,尽管只需要最后一个块:

unnest

仅靠此并不能立即起作用,但是我们可以data_frame( path = list.files(path = "~/StackOverflow/Prah/", pattern = "*.xlsx", full.names = TRUE) ) %>% mutate(sheets = map(path, excel_sheets)) %>% unnest(sheets) # # A tibble: 6 x 2 # path sheets # <chr> <chr> # 1 "C:\\Users\\r2/StackOverflow/Prah/mt1.xlsx" Sheet1 # 2 "C:\\Users\\r2/StackOverflow/Prah/mt1.xlsx" Sheet2 # 3 "C:\\Users\\r2/StackOverflow/Prah/mt2.xlsx" Sheet1 # 4 "C:\\Users\\r2/StackOverflow/Prah/mt2.xlsx" Sheet2 # 5 "C:\\Users\\r2/StackOverflow/Prah/mt3.xlsx" Sheet1 # 6 "C:\\Users\\r2/StackOverflow/Prah/mt3.xlsx" Sheet2

map2

现在应该清楚的是,我们现在只需要使用data_frame( path = list.files(path = "~/StackOverflow/Prah/", pattern = "*.xlsx", full.names = TRUE) ) %>% mutate(sheets = map(path, excel_sheets)) %>% unnest(sheets) %>% mutate(data = map2(path, sheets, ~ read_excel(path = .x, sheet = .y))) # # A tibble: 6 x 3 # path sheets data # <chr> <chr> <list> # 1 "C:\\Users\\r2/StackOverflow/Prah/mt1.xlsx" Sheet1 <tibble [32 x 11]> # 2 "C:\\Users\\r2/StackOverflow/Prah/mt1.xlsx" Sheet2 <tibble [32 x 11]> # 3 "C:\\Users\\r2/StackOverflow/Prah/mt2.xlsx" Sheet1 <tibble [32 x 11]> # 4 "C:\\Users\\r2/StackOverflow/Prah/mt2.xlsx" Sheet2 <tibble [32 x 11]> # 5 "C:\\Users\\r2/StackOverflow/Prah/mt3.xlsx" Sheet1 <tibble [32 x 11]> # 6 "C:\\Users\\r2/StackOverflow/Prah/mt3.xlsx" Sheet2 <tibble [32 x 11]> 或类似的方法遍历每一行,就可以得到一个嵌套整齐的数据框:

mtcars

(我写了几张excel工作簿,每本都有两张纸,每张纸上都有sei();。没什么。)