如何在保留原始数据帧的同时获取组级统计信息?

时间:2016-10-06 12:15:30

标签: r data.table dplyr tidyr

我有以下数据框

one <- c('one',NA,NA,NA,NA,'two',NA,NA)
group1 <- c('A','A','A','A','B','B','B','B')
group2 <- c('C','C','C','D','E','E','F','F')

df = data.frame(one, group1,group2)


> df
   one group1 group2
1  one      A      C
2 <NA>      A      C
3 <NA>      A      C
4 <NA>      A      D
5 <NA>      B      E
6  two      B      E
7 <NA>      B      F
8 <NA>      B      F

我希望为onegroup1的每个组合获得group2的非遗漏观察数。

在Pandas中,我会使用groupby(['group1','group2']).transform,但我怎么能在R中这样做呢?原始数据帧很大。

预期输出为:

> df
   one group1 group2 count
1  one      A      C     1
2 <NA>      A      C     1
3 <NA>      A      C     1
4 <NA>      A      D     0
5 <NA>      B      E     1
6  two      B      E     1
7 <NA>      B      F     0
8 <NA>      B      F     0

非常感谢!

3 个答案:

答案 0 :(得分:6)

library(dplyr)

df %>% group_by(group1, group2) %>% mutate(count = sum(!is.na(one)))
Source: local data frame [8 x 4]
Groups: group1, group2 [4]

     one group1 group2 count
  <fctr> <fctr> <fctr> <int>
1    one      A      C     1
2     NA      A      C     1
3     NA      A      C     1
4     NA      A      D     0
5     NA      B      E     1
6    two      B      E     1
7     NA      B      F     0
8     NA      B      F     0

答案 1 :(得分:5)

data.table

setDT(df)
df[,count_B:=sum(!is.na(one)),by=c("group1","group2")]

给出:

   one group1 group2 count_B
1: one      A      C       1
2:  NA      A      C       1
3:  NA      A      C       1
4:  NA      A      D       0
5:  NA      B      E       1
6: two      B      E       1
7:  NA      B      F       0
8:  NA      B      F       0

我们的想法是在按NAgroup1分组时将真值(1转换为整数)加起来,其中B不是group2

答案 2 :(得分:4)

我们不要忘记data.table R中可以做很多事情,尽管有时效率不如dplyrdf$count<-ave(as.integer(df$one),df[,2:3],FUN=function(x) sum(!is.na(x))) # one group1 group2 count #1 one A C 1 #2 <NA> A C 1 #3 <NA> A C 1 #4 <NA> A D 0 #5 <NA> B E 1 #6 two B E 1 #7 <NA> B F 0 #8 <NA> B F 0

SUMIF