Question

我有两个需求，都连接到类似于下面可重现的数据集。我有一个包含18个实体的列表，每个实体由17-19个data.frames列表组成。随后是可重复的数据集（有矩阵而不是data.frames，但我认为这不会产生影响）：

test <- list(list(matrix(10:(50-1), ncol = 10), matrix(60:(100-1), ncol = 10), matrix(110:(150-1), ncol = 10)),
             list(matrix(200:(500-1), ncol = 10), matrix(600:(1000-1), ncol = 10), matrix(1100:(1500-1), ncol = 10)))

我需要将每个数据帧/矩阵分成两部分（按给定的行数）并保存到新的列表列表
其次，我需要从列表列表中的每个data.frame中提取并保存给定的列。

除了for()之外，我不知道该怎么做，但我相信apply()系列函数应该可行。

感谢您阅读

修改

我的预期输出如下：

extractedColumns <- list(list(matrix(10:(50-1), ncol = 10)[, 2], matrix(60:(100-1), ncol = 10)[, 2], matrix(110:(150-1), ncol = 10)[, 2]),
                         list(matrix(200:(500-1), ncol = 10)[, 2], matrix(600:(1000-1), ncol = 10)[, 2], matrix(1100:(1500-1), ncol = 10)[, 2]))


numToSubset <- 3
substetFrames <- list(list(list(matrix(10:(50-1), ncol = 10)["first length - numToSubset rows", ], matrix(10:(50-1), ncol = 10)["last numToSubset rows", ]), 
                           list(matrix(60:(100-1), ncol = 10)["first length - numToSubset rows", ], matrix(60:(100-1), ncol = 10)["last numToSubset rows", ]),
                                list(matrix(110:(150-1), ncol = 10)["first length - numToSubset rows", ], matrix(110:(150-1), ncol = 10)["last numToSubset rows", ])),
                      etc...)

看起来非常混乱，希望你能按照我的意愿行事。

Answer 1

您可以使用两个嵌套的lapply：

lapply(test, function(x) lapply(x, '[', c(2, 3)))

输出继电器：

[[1]]
[[1]][[1]]
[1] 11 12

[[1]][[2]]
[1] 61 62

[[1]][[3]]
[1] 111 112


[[2]]
[[2]][[1]]
[1] 201 202

[[2]][[2]]
[1] 601 602

[[2]][[3]]
[1] 1101 1102

解释

第一个lapply将应用于test的两个列表。这两个列表中的每一个都包含另外一个3.第二个lapply将遍历这3个列表和子集（即第二个'['）lapply列中的c(2, 3)函数[。

注意：如果矩阵lapply将子集元素2和3，但相同的函数将在data.frame上使用时对列进行子集化。

子行和列

#change rows and columns into what you need lapply(test, function(x) lapply(x, function(y) y[rows, columns]))使用匿名函数非常灵活。通过将代码更改为：

import tensorflow as tf

x = tf.constant([[0, 0, 0],
                 [0, 1, 0],
                 [0, 3, 0]])

with tf.Session() as sess:
    coordinates = tf.where(tf.greater(x, 0))
    print(coordinates.eval()) # [[1 1], [2 1]]
    print(tf.gather_nd(x, coordinates).eval()) # [1, 3]

您可以指定所需的行或列的任意组合。

Answer 2

为了搭载@ LyzandeR的答案，考虑应用系列rapply经常被忽略的兄弟，它可以递归地运行向量/矩阵列表上的函数，返回这样的嵌套结构。通常它可以与嵌套的lapply或其变体v/sapply进行比较：

newtest1 <- lapply(test, function(x) lapply(x, '[', c(2, 3)))

newtest2 <- rapply(test, function(x) `[`(x, c(2, 3)), classes="matrix", how="list")

all.equal(newtest1, newtest2)
# [1] TRUE

有趣的是，令我惊讶的是，与嵌套rapply相比，lapply在此用例中运行速度较慢！嗯，回到实验室我去......

library(microbenchmark)

microbenchmark(newtest1 <- lapply(test, function(x) lapply(x, '[', c(2, 3))))    
# Unit: microseconds
#     mean median     uq    max neval
# 31.92804 31.278 32.241 74.587   100

microbenchmark(newtest2 <- rapply(test, function(x) `[`(x, c(2, 3)),
                                        classes="matrix", how="list"))    
# Unit: microseconds
#    min    lq     mean median      uq    max neval
# 69.293 72.18 79.53353 73.143 74.5865 219.91   100

更有趣的是，删除等效矩阵包围的[运算符，嵌套lapply运行得更好，rapply甚至更差！

microbenchmark(newtest3 <- lapply(test, function(x) 
                                  lapply(x, function(y) y[c(2, 3), 1])))
# Unit: microseconds
#    min     lq     mean median     uq    max neval
# 26.947 28.391 32.00987 29.354 30.798 100.09   100

all.equal(newtest1, newtest3)
# [1] TRUE

microbenchmark(newtest4 <- rapply(test, function(x) x[c(2,3), 1], 
                                  classes="matrix", how="list"))
# Unit: microseconds
#    min     lq     mean median     uq     max neval
# 74.105 76.752 80.37076 77.955 78.918 203.549   100

all.equal(newtest2, newtest4)
# [1] TRUE

R - 从data.frames列表列表中提取信息

2 个答案:

解释

子行和列