我之前问过这个问题,但实际上并没有解决任何问题。我在此方面做了更多工作,但又被卡住了!
我有一个包含两个标签的电子表格,一个包含三个我感兴趣的单元格(A2,A4,A6),这些单元格用于标识详细信息,第二个标签具有一个4X4网格(A1:D4),其中包含一些财务信息。
我可以制作一个数据框,可以定位数据,并且在一定程度上可以提取数据。我的问题是将整个内容循环遍历文件夹中的所有文件,并获取数据并将其应用于预先创建的数据框。
以下代码供您参考
查找文件
list.files(
path = "C:/Excel Files",
pattern = '*.xlsx|*.XLSX',
full.names = FALSE,
recursive = FALSE
)
创建df
colnames <- c( A2, A4, A6, A1, B1, C1, D1, A2, B2, C2, D2, A3, B3, C3, D3, A4, B4, C4, D4)
output <- matrix(NA,nrow = length(file.list), ncol = length(colnames), byrow = FALSE)
colnames(output) <- c(colnames)
rownames(output) <- c(file.list)
提取数据
FirmData1 <- readxl::read_xlsx("N:/Excel Files/test.xlsx", sheet = 2, range = "A1:D1", na = "", col_names = FALSE, col_types = "text")
FirmData2 <- readxl::read_xlsx("N:/Excel Files/test.xlsx", sheet = 2, range = "A2:D2", na = "", col_names = FALSE, col_types = "text")
FirmData3 <- readxl::read_xlsx("N:/Excel Files/test.xlsx", sheet = 2, range = "A3:D3", na = "", col_names = FALSE, col_types = "text")
FirmData4 <- readxl::read_xlsx("N:/Excel Files/test.xlsx", sheet = 2, range = "A4:D4", na = "", col_names = FALSE, col_types = "text")
FirmData <- dplyr:: bind_rows(FirmData1, FirmData2, FirmData3, FirmData4)
FirmData <- t(FirmData)
colnames(output)
Firm <- dplyr:: bind_rows(FirmInfo, FirmData) %>%
tidyr:: spread(key = Field, value = Value)
循环
没有循环!
答案 0 :(得分:0)
这是将它们循环在一起的一种方法。
我将首先创建一个电子表格进行处理。我正在使用openxlsx
,但这仅需要创建文件,而无需读取(对此我仍将使用readxl
)。
wb <- openxlsx::createWorkbook()
openxlsx::addWorksheet(wb, "FirstSheet")
openxlsx::writeDataTable(wb, "FirstSheet", data.frame(t(outer(c("A","B"), 1:6, paste0))), colNames = FALSE)
openxlsx::addWorksheet(wb, "SecondSheet")
openxlsx::writeDataTable(wb, "SecondSheet", mtcars[1:4, 1:4], colNames = FALSE)
openxlsx::saveWorkbook(wb, "quux.xlsx")
readxl::read_xlsx("quux.xlsx", "FirstSheet", range = c("A2:A6"), col_names = "A")
# # A tibble: 5 x 1
# A
# <chr>
# 1 A2
# 2 A3
# 3 A4
# 4 A5
# 5 A6
readxl::read_xlsx("quux.xlsx", "SecondSheet", range = c("A1:D4"), col_names = LETTERS[1:4])
# # A tibble: 4 x 4
# A B C D
# <dbl> <dbl> <dbl> <dbl>
# 1 21 6 160 110
# 2 21 6 160 110
# 3 22.8 4 108 93
# 4 21.4 6 258 110
首先,显示我们要对每个文件执行的操作:
fn <- "quux.xlsx"
first <- readxl::read_xlsx(fn, "FirstSheet", range = "A2:A6", col_names = "A")
second <- readxl::read_xlsx(fn, "SecondSheet", range = "A1:D4", col_names = LETTERS[1:4])
data.frame(matrix(first$A[c(1,3,5)], nrow = 1), stringsAsFactors = FALSE)
# X1 X2 X3
# 1 A2 A4 A6
data.frame(matrix(t(second), nrow = 1))
# X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X16
# 1 21 6 160 110 21 6 160 110 22.8 4 108 93 21.4 6 258 110
当然,名字很无聊,但这只是可以用colnames
来弥补的美学。
现在,让我们lapply
全部完成,然后将结果合并为一帧。
filelist <- c("quux.xlsx", "quux.xlsx", "quux.xlsx")
datlist <- lapply(filelist, function(fn) {
first <- readxl::read_xlsx(fn, "FirstSheet", range = "A2:A6", col_names = "A")
second <- readxl::read_xlsx(fn, "SecondSheet", range = "A1:D4", col_names = LETTERS[1:4])
cbind(
data.frame(matrix(first$A[c(1,3,5)], nrow = 1), stringsAsFactors = FALSE),
data.frame(matrix(t(second), nrow = 1))
)
})
out <- do.call(rbind, datlist)
out
# X1 X2 X3 X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X16
# 1 A2 A4 A6 21 6 160 110 21 6 160 110 22.8 4 108 93 21.4 6 258 110
# 2 A2 A4 A6 21 6 160 110 21 6 160 110 22.8 4 108 93 21.4 6 258 110
# 3 A2 A4 A6 21 6 160 110 21 6 160 110 22.8 4 108 93 21.4 6 258 110
旁注:
您的使用list.files
对我来说有点奇怪,也许您有理由。我倾向于总是使用full.names=TRUE
,因为我需要它与我的工作目录无关。特别是,您将路径设置为很容易成为工作目录的路径,然后在读取文件时必须将目录与文件名一起粘贴回去。另外,虽然很小,但是您的模式可能很好,但是如果有人创建了一个名为quux.XlSx
(混合大小写)的文件,您将看不到它。允许使用ignore.case=TRUE
。
我建议
filelist <- list.files(
path = "C:/Excel Files",
pattern = '*.xlsx',
ignore.case = TRUE,
full.names = TRUE,
recursive = FALSE
)