Question

在对data.frame

进行分组后，我无法选择第二列

d <- data.frame(x = 1:10, y = runif(1))
d[,2] # selects the second column
d <- group_by(d, x)
d[,2] # produces the error: index out of bounds

Answer 1

我认为这是dplyr中对grouped_df对象的预期行为 - 逻辑是在数据仍然分组时不能删除分组变量。考虑这个示例，我使用dplyr的select函数从grouped_df中提取变量：

require(dplyr)
d <- data.frame(x = 1:10, y = runif(1), z  = rnorm(2))
d <- group_by(d, x)

select(d, y)  
#Source: local data frame [10 x 2]
#Groups: x
#
#    x         y
#1   1 0.5861766
#2   2 0.5861766
#3   3 0.5861766
#4   4 0.5861766
#5   5 0.5861766
#6   6 0.5861766
#7   7 0.5861766
#8   8 0.5861766
#9   9 0.5861766
#10 10 0.5861766

您可以看到结果包含分组变量，即使它未在select调用中指定。

select(d, z) # would work the same way

即使您明确排除了分组变量＆＃34; x＆＃34;，使用select时仍会返回它：

select(d, -x)
#Source: local data frame [10 x 3]
#Groups: x
#
#    x         y         z
#1   1 0.2110696 2.4393919
#2   2 0.2110696 0.8400083
#3   3 0.2110696 2.4393919
#4   4 0.2110696 0.8400083
#5   5 0.2110696 2.4393919
#6   6 0.2110696 0.8400083
#7   7 0.2110696 2.4393919
#8   8 0.2110696 0.8400083
#9   9 0.2110696 2.4393919
#10 10 0.2110696 0.8400083

只获得＆＃34; y＆＃34;在列中，您需要先取消组合数据：

ungroup(d) %>% select(y)
#Source: local data frame [10 x 1]
#
#           y
#1  0.5861766
#2  0.5861766
#3  0.5861766
#4  0.5861766
#5  0.5861766
#6  0.5861766
#7  0.5861766
#8  0.5861766
#9  0.5861766
#10 0.5861766

请注意，您可以使用包含分组变量的[的任何子集，例如：

d[, 1:2]

或

d[, c(1,3)]

使用dplyr 0.3.02中的group_by对数据帧进行分组后选择列时出错

1 个答案: