Question

我的数据排列如下：

列表
在列表中，每个元素都是一个数据框（50个数据框）
每个数据框包含5行数字，9个命名列（所有50个数据帧都有9个名称）

我的目标是有效地重新排列此数据：

“第二维”（在上面的描述中从1到5）成为第一维。
“第一维”（在上面的描述中从1到50）成为第二维。
我只保留9个命名列中的一些（其余部分可以丢弃），按名称
我希望所有数字都存储在一个数组中（或者其他更高效的数据结构也很好），而不是这些低效的列表和数据框。

示例数据可以使用以下代码生成（简化为只有2个数据帧，每行5行3列）：

example_list<-lapply(X=1:2, FUN=function(X){setNames(data.frame(X*c(1:5), -X*c(1:5), X*100*c(1:5)), c("C1", "C2", "C3"))})

这将创建以下两个数据框的列表：

> example_list[1]

  C1 C2  C3
1  1 -1 100
2  2 -2 200
3  3 -3 300
4  4 -4 400
5  5 -5 500

> example_list[2]

  C1  C2   C3
1  2  -2  200
2  4  -4  400
3  6  -6  600
4  8  -8  800
5 10 -10 1000

我的当前解决方案（带有示例数据的硬编码数字）如下所示。在这种情况下，我假设我们只关心名为“C1”和“C2”的列：

important_cols <- c("C1", "C2")
result <- array(0, c(5, 2, length(important_cols)))
for(i in 1:5){
    for(j in 1:2){
        result[i,j,] <- c(example_list[[j]][i,important_cols], recursive=T)
    }
}

其中给出了以下输出：

> result
, , 1

     [,1] [,2]
[1,]    1    2
[2,]    2    4
[3,]    3    6
[4,]    4    8
[5,]    5   10

, , 2

     [,1] [,2]
[1,]   -1   -2
[2,]   -2   -4
[3,]   -3   -6
[4,]   -4   -8
[5,]   -5  -10

例如，result[5,2,] = [10, -10]对应于原始数据的5数据帧的2行（删除了第三列）。

上述解决方案有效，但我不禁怀疑应该有一个明显更有效的解决方案比双手动实现的for循环并且逐个设置所有元素之一。

Answer 1

您可以使用一些lapply和purrr::transpose来避免循环：

# Example
N <- 1e5
example_list <-
  lapply(
    X = 1:2,
    FUN = function(X) {
      setNames(data.frame(X * c(1:N), -X * c(1:N), X * 100 * c(1:N)), c("C1", "C2", "C3"))
    }
  )

important_cols <- c("C1", "C2")    

# Your solution -> 58 seconds :O
system.time({
  result <- array(0, c(N, 2, length(important_cols)))
  for(i in 1:N){
    for(j in 1:2){
      result[i,j,] <- c(example_list[[j]][i,important_cols], recursive=T)
    }
  }
})

# Solution with purrr::transpose -> 0 sec    
library(magrittr)  ## for the %>%
system.time({
  result2 <- example_list %>%
    lapply(function(df) df[important_cols]) %>%
    purrr::transpose() %>%
    sapply(function(l) do.call(cbind, l))
})
dim(result2) <- c(nrow(example_list[[1]]), 
                  length(example_list), 
                  length(important_cols))

# Verification
all.equal(result, result2)

有效地将数据列表中的数据重新排列到R中的数组中？

1 个答案: