如何在循环中创建数据框

时间:2020-02-06 16:00:05

标签: r dataframe dplyr

我之前问过这个问题,但实际上并没有解决任何问题。我在此方面做了更多工作,但又被卡住了!

我有一个包含两个标签的电子表格,一个包含三个我感兴趣的单元格(A2,A4,A6),这些单元格用于标识详细信息,第二个标签具有一个4X4网格(A1:D4),其中包含一些财务信息。

我可以制作一个数据框,可以定位数据,并且在一定程度上可以提取数据。我的问题是将整个内容循环遍历文件夹中的所有文件,并获取数据并将其应用于预先创建的数据框。

以下代码供您参考

查找文件

  list.files(
    path = "C:/Excel Files",
    pattern = '*.xlsx|*.XLSX',
    full.names = FALSE,
    recursive = FALSE
  ) 

创建df

    colnames <- c( A2, A4, A6, A1, B1, C1, D1, A2, B2, C2, D2, A3, B3, C3, D3, A4, B4, C4, D4)

    output <- matrix(NA,nrow = length(file.list), ncol = length(colnames), byrow = FALSE)
    colnames(output) <- c(colnames)
    rownames(output) <- c(file.list)

提取数据

    FirmData1 <- readxl::read_xlsx("N:/Excel Files/test.xlsx", sheet = 2, range = "A1:D1", na = "", col_names = FALSE, col_types = "text")
    FirmData2 <- readxl::read_xlsx("N:/Excel Files/test.xlsx", sheet = 2, range = "A2:D2", na = "", col_names = FALSE, col_types = "text")
    FirmData3 <- readxl::read_xlsx("N:/Excel Files/test.xlsx", sheet = 2, range = "A3:D3", na = "", col_names = FALSE, col_types = "text")
    FirmData4 <- readxl::read_xlsx("N:/Excel Files/test.xlsx", sheet = 2, range = "A4:D4", na = "", col_names = FALSE, col_types = "text")

    FirmData <-  dplyr:: bind_rows(FirmData1, FirmData2, FirmData3, FirmData4)
    FirmData <- t(FirmData)
    colnames(output)

    Firm <- dplyr:: bind_rows(FirmInfo, FirmData) %>%
      tidyr:: spread(key = Field, value = Value)

循环

没有循环!

1 个答案:

答案 0 :(得分:0)

这是将它们循环在一起的一种方法。

我将首先创建一个电子表格进行处理。我正在使用openxlsx,但这仅需要创建文件,而无需读取(对此我仍将使用readxl)。

wb <- openxlsx::createWorkbook()
openxlsx::addWorksheet(wb, "FirstSheet")
openxlsx::writeDataTable(wb, "FirstSheet", data.frame(t(outer(c("A","B"), 1:6, paste0))), colNames = FALSE)
openxlsx::addWorksheet(wb, "SecondSheet")
openxlsx::writeDataTable(wb, "SecondSheet", mtcars[1:4, 1:4], colNames = FALSE)
openxlsx::saveWorkbook(wb, "quux.xlsx")

readxl::read_xlsx("quux.xlsx", "FirstSheet", range = c("A2:A6"), col_names = "A")
# # A tibble: 5 x 1
#   A    
#   <chr>
# 1 A2   
# 2 A3   
# 3 A4   
# 4 A5   
# 5 A6   
readxl::read_xlsx("quux.xlsx", "SecondSheet", range = c("A1:D4"), col_names = LETTERS[1:4])
# # A tibble: 4 x 4
#       A     B     C     D
#   <dbl> <dbl> <dbl> <dbl>
# 1  21       6   160   110
# 2  21       6   160   110
# 3  22.8     4   108    93
# 4  21.4     6   258   110

首先,显示我们要对每个文件执行的操作:

fn <- "quux.xlsx"
first  <- readxl::read_xlsx(fn,  "FirstSheet", range = "A2:A6", col_names = "A")
second <- readxl::read_xlsx(fn, "SecondSheet", range = "A1:D4", col_names = LETTERS[1:4])
data.frame(matrix(first$A[c(1,3,5)], nrow = 1), stringsAsFactors = FALSE)
#   X1 X2 X3
# 1 A2 A4 A6
data.frame(matrix(t(second), nrow = 1))
#   X1 X2  X3  X4 X5 X6  X7  X8   X9 X10 X11 X12  X13 X14 X15 X16
# 1 21  6 160 110 21  6 160 110 22.8   4 108  93 21.4   6 258 110

当然,名字很无聊,但这只是可以用colnames来弥补的美学。

现在,让我们lapply全部完成,然后将结果合并为一帧。

filelist <- c("quux.xlsx", "quux.xlsx", "quux.xlsx")
datlist <- lapply(filelist, function(fn) {
  first  <- readxl::read_xlsx(fn,  "FirstSheet", range = "A2:A6", col_names = "A")
  second <- readxl::read_xlsx(fn, "SecondSheet", range = "A1:D4", col_names = LETTERS[1:4])
  cbind(
    data.frame(matrix(first$A[c(1,3,5)], nrow = 1), stringsAsFactors = FALSE),
    data.frame(matrix(t(second), nrow = 1))    
  )
})
out <- do.call(rbind, datlist)
out
#   X1 X2 X3 X1 X2  X3  X4 X5 X6  X7  X8   X9 X10 X11 X12  X13 X14 X15 X16
# 1 A2 A4 A6 21  6 160 110 21  6 160 110 22.8   4 108  93 21.4   6 258 110
# 2 A2 A4 A6 21  6 160 110 21  6 160 110 22.8   4 108  93 21.4   6 258 110
# 3 A2 A4 A6 21  6 160 110 21  6 160 110 22.8   4 108  93 21.4   6 258 110

旁注:

您的使用list.files对我来说有点奇怪,也许您有理由。我倾向于总是使用full.names=TRUE,因为我需要它与我的工作目录无关。特别是,您将路径设置为很容易成为工作目录的路径,然后在读取文件时必须将目录与文件名一起粘贴回去。另外,虽然很小,但是您的模式可能很好,但是如果有人创建了一个名为quux.XlSx(混合大小写)的文件,您将看不到它。允许使用ignore.case=TRUE

我建议

filelist <- list.files(
  path = "C:/Excel Files",
  pattern = '*.xlsx',
  ignore.case = TRUE,
  full.names = TRUE,
  recursive = FALSE
)