我有一个DF如下,
a <- data.frame(group =c(1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4,5,5,5,5), count = c(12L, 80L, 102L, 97L, 118L, 115L, 4L, 13L, 136L,114L, 134L, 126L, 128L, 63L, 118L, 1L, 28L, 18L, 18L, 23L))
group count
1 1 12
2 1 80
3 1 102
4 1 97
5 2 118
6 2 115
7 2 4
8 2 13
9 3 136
10 3 114
11 3 134
12 3 126
13 4 128
14 4 63
15 4 118
16 4 1
17 5 28
18 5 18
19 5 18
20 5 23
我使用了以下命令,
a %>% group_by(group) %>% summarise(mean(count))
group mean(count)
(dbl) (dbl)
1 1 72.75
2 2 62.50
3 3 127.50
4 4 77.50
5 5 21.75
我想过滤掉属于最高平均值的组的条目。在这里说第三组包含最大平均值,所以我的输出应该是,
group count
1 3 136
2 3 114
3 3 134
4 3 126
任何人都可以知道如何做到这一点吗?
答案 0 :(得分:4)
您希望mutate
代替summarize
,以便将所有观察结果保存在data.frame
中。
new_data <- a %>% group_by(group) %>%
##compute average count within groups
mutate(AvgCt = mean(count)) %>%
ungroup() %>%
##filter, looking for the maximum of the created variable
filter(AvgCt == max(AvgCt))
然后你有最终的输出
> new_data
Source: local data frame [4 x 3]
group count AvgCt
(dbl) (int) (dbl)
1 3 136 127.5
2 3 114 127.5
3 3 134 127.5
4 3 126 127.5
并且,如果您希望删除计算变量,
new_data <- new_data %>% select(-AvgCt)
> new_data
Source: local data frame [4 x 2]
group count
(dbl) (int)
1 3 136
2 3 114
3 3 134
4 3 126
答案 1 :(得分:4)
如果您想查看基本R解决方案,可以使用which.max
和aggregate
执行此操作:
# calculate means by group
myMeans <- aggregate(count~group, a, FUN=mean)
# select the group with the max mean
maxMeanGroup <- a[a$group == myMeans[which.max(myMeans$count),]$group, ]
作为第二种方法,您可以尝试data.table
:
library(data.table)
setDT(a)
a[group == a[, list("count"=mean(count)), by=group
][, which.max(count)], ]
返回
group count
1: 3 136
2: 3 114
3: 3 134
4: 3 126
答案 2 :(得分:4)
也许还有一些xtabs
/ tabulate
也有一些乐趣(如果groups
不仅仅是数字,则需要将names
添加到which.max
呼叫)
a[a$group == which.max(xtabs(count ~ group, a) / tabulate(a$group)),]
# group count
# 9 3 136
# 10 3 114
# 11 3 134
# 12 3 126
或与rowsum
a[a$group == which.max(rowsum.default(a$count, a$group) / tabulate(a$group)), ]
# group count
# 9 3 136
# 10 3 114
# 11 3 134
# 12 3 126
答案 3 :(得分:2)
使用dplyr
:
a %>% group_by(group) %>%
mutate(mc = mean(count)) %>% ungroup() %>%
filter(mc == max(mc)) %>% select(-mc)
Source: local data frame [4 x 2]
group count
(dbl) (int)
1 3 136
2 3 114
3 3 134
4 3 126
data.table
的另一个选项:
a[a[, .(mc = mean(count)), .(group)][mc == max(mc), -"mc", with=F], on = "group"]
group count
1: 3 136
2: 3 114
3: 3 134
4: 3 126