Question

我通常喜欢使用lapply()而不是for循环：

lx <- split( x, x$hr) #with the next step being lapply( lx, function( x) ...)).

但现在lx的每个元素都包含hr列，效率很低，因为该信息已在names( lx)中。

所以现在我必须这样做：

lx <- lapply( lx, function( X) select( X, -hr))

（另一种选择是：

HR <- unique( x$hr)
lx <- select( lx, -hr)
lx <- split( x, HR)

）

lapply()循环for的全部要点是高效的，所以这些额外的行会打扰我。这似乎是一个常见的用例，我的经验表明，通常R有更高效的东西，或者我错过了一些东西。

这可以通过单个函数调用或单行程实现吗？

编辑：具体例子

DF <- data.frame( A = 1:2, B = 2:3, C = 3:4)
DF <- split( DF, factor( DF$A))  # but each list element still contains the column A which is
                                 # redundant (because the names() of the list element equals A 
                                 # as well), so I have to write the following line if I want 
                                 # to be efficient especially with large datasets
DF <- lapply( DF, function( x) select( x, -A))  # I hate always writing this line!

Answer 1

首先删除拆分列：

split(DF[-1], DF[[1]])

或

split(subset(DF, select = -A), DF$A)

更新：添加了最后一行。

R split函数 - 不包括新数据集中的分组变量

1 个答案: