Question

通过本网站用户的调查和大力帮助，我已经能够将多个xlsx文件上传到R中，既可以作为单独的数据帧，也可以作为包含多个数据帧的单个对象

(Folder="I:/Marcs_Discretinization_try_1/Attempt1/Actual Data/actualdata/"
Files=list.files(path=Folder, pattern=".xlsx")
x=sapply(paste0(Folder,Files), read.xlsx2, as.data.frame=T, sheetIndex=1, simplify=FALSE)

使用上面的代码，我可以将df作为x$~2015-B1-2OR.xlsx来调用，但是如何迭代这些对象中的每一个，这是解决问题的一种更简单的方法，而不是设置为循环使用大量数据帧（~200）？

示例数据，如果我调用对象x：

$`I:/Marcs_Discretinization_try_1/Attempt1/Actual Data/actualdata/2015-X2-2OR.xlsx`

  Year Day Tank depth.1 depth.2 mid.depth   S
1 2015 2OR   X2    0.11   0.135    0.1225 4.1
2 2015 2OR   X2   0.135    0.16    0.1475 5.6

$`I:/Marcs_Discretinization_try_1/Attempt1/Actual Data/actualdata/2015-X2-OR10.xlsx`

  Year  Day Tank depth.1 depth.2 mid.depth   S
1 2015 OR10   X2   0.075     0.1    0.0875 4.6
2 2015 OR10   X2     0.1   0.125    0.1125 4.2
3 2015 OR10   X2   0.125    0.16    0.1425 5.2
4 2015 OR10   X2    0.16   0.175    0.1675 5.2
5 2015 OR10   X2   0.175     0.2    0.1875 6.8
6 2015 OR10   X2     0.2   0.225    0.2125 7.5
7 2015 OR10   X2   0.225    0.25    0.2375 8.8

您可以看到x的每个级别中有多个列和行。如何迭代x级别并调用特定列？

例如lapply(x, nrow)列出了每个级别的行，但如果我想返回特定列的nrow该怎么办？

Answer 1

首先，我认为简化x列表的名称是个好主意：

names(x) <- gsub("^I:/Marcs_Discretinization_try_1/Attempt1/Actual Data/actualdata/|\\.xlsx","",names(x))

由于您的数据过于复杂，我制作了一个可供使用的列表：

A <- structure(list(A1 = structure(list(x = structure(c(1L, 1L, 2L, 
3L, 2L), .Label = c("a", "b", "c"), class = "factor"), y = c(0.00840516341850162, 
0.970356883713976, 0.469053473789245, 0.847559429006651, 0.646102252649143
)), .Names = c("x", "y"), row.names = c(NA, -5L), class = "data.frame"), 
    A2 = structure(list(x = structure(c(1L, 1L, 2L, 3L, 2L), .Label = c("a", 
    "b", "c"), class = "factor"), y = c(0.599587128963321, 0.390590411843732, 
    0.11197471502237, 0.824506989680231, 0.608971498440951)), .Names = c("x", 
    "y"), row.names = c(NA, -5L), class = "data.frame"), A3 = structure(list(
        x = structure(c(1L, 1L, 2L, 3L, 2L), .Label = c("a", 
        "b", "c"), class = "factor"), y = c(-2.61798606622622, 
        0.696978535260992, -0.758098875328806, -1.08656950056061, 
        1.3469375195447), z = c(0.346128243254498, 0.691243288572878, 
        0.285317465662956, 0.125597422709689, 0.0258157614152879
        )), .Names = c("x", "y", "z"), row.names = c(NA, -5L), class = "data.frame")), .Names = c("A1", 
"A2", "A3"))

现在，我认为您可以使用lapply来执行您想要对每个数据框执行的任何操作：

# get the number of observations in each data frame
lapply(A,nrow)
   # produces number of rows of each data frame

library(data.table)
lapply(A,setDT)
lapply(A, function(j) j[,sum(y),by=x][order(-V1)])
   # sums A[i]$y over each level in A[i]$x; with i being each data frame in A

以及你想要做的任何事情......

如何将代码应用于包含许多数据帧的一个对象？

1 个答案: