我有一个数据集
>data.frame(GROUP=c("A","A","A","G","G","F","F","E","T"),
FIRST=c(10,2,3,6,NA,NA,NA,1,NA),
SECOND=c(3,NA,NA,1,NA,4,2,1,NA),
THIRD=c(5,7,NA,NA,NA,1,NA,1,1))
GROUP FIRST SECOND THIRD
1 A 10 3 5
2 A 2 NA 7
3 A 3 NA NA
4 G 6 1 NA
5 G NA NA NA
6 F NA 4 1
7 F NA 2 NA
8 E 1 1 1
9 T NA NA 1
我想以两种方式使用GROUP-column组合数据:
组内列的平均值
GROUP FIRST SECOND THIRD
1 A 5 3 6
2 G 6 1 NA
3 F NA 3 1
4 E 1 1 1
5 T NA NA 1
组内的逐列最大值
GROUP FIRST SECOND THIRD
1 A 10 3 7
2 G 6 1 NA
3 F NA 4 1
4 E 1 1 1
5 T NA NA 1
有快速的方法可以做到这一点,还是应该创建一个新功能?
答案 0 :(得分:2)
我们可以使用aggregate
base R
aggregate(.~GROUP, d1, mean, na.rm = TRUE, na.action=NULL)
或使用dplyr
library(dplyr)
d1 %>%
group_by(GROUP) %>%
summarise_each(funs(mean=mean(., na.rm = TRUE)))
或者
d1 %>%
group_by(GROUP) %>%
summarise_each(funs(max=max(., na.rm = TRUE)))