R:根据该组的子集分配组值

时间:2017-06-01 17:24:49

标签: r group-by aggregate tidyverse

有没有办法将特定组级别的平均值分配给整个组?下面是我尝试做的一个例子 - 我正在使用库(tidyverse)。

假设:

> DF <- data.frame(A = c("P1","P1","P1","P2","P2","P2"), B = c("Yes","Yes","No","Yes","No","No"), C = c(10,10,2,20,3,3))
> DF
   A   B  C
1 P1 Yes 10
2 P1 Yes 10
3 P1  No  2
4 P2 Yes 20
5 P2  No  3
6 P2  No  3

我想创建&#34;意思是&#34;根据平均值B =&#34;是&#34;按&#34; A&#34;分组:

> DF <- data.frame(A = c("P1","P1","P1","P2","P2","P2"), B = c("Yes","Yes","No","Yes","No","No"), C = c(10,10,2,20,3,3))
> DF
   A   B  C  mean
1 P1 Yes 10  10
2 P1 Yes 10  10
3 P1  No  2  10
4 P2 Yes 20  20
5 P2  No  3  20
6 P2  No  3  20

以下是我尝试的内容:

> DF %>% group_by(A) %>% mutate(temp = ifelse(B=="Yes", 1, 0), s= sum(temp), mean = sum(C*temp)/s)

# A tibble: 6 x 6
       A      B     C  temp     s  mean
  <fctr> <fctr> <dbl> <dbl> <dbl> <dbl>
1     P1    Yes    10     1     2    10
2     P1    Yes    10     1     2    10
3     P1     No     2     0     2    10
4     P2    Yes    20     1     1    20
5     P2     No     3     0     1    20
6     P2     No     3     0     1    20

1 个答案:

答案 0 :(得分:1)

基础R中一个相当简单的方法是按组计算所需的均值,然后将这些结果合并到原始data.frame上。

merge(DF, aggregate(cbind(mean=C)~A, data=DF[DF$B=="Yes",], mean), by="A")
   A   B  C mean
1 P1 Yes 10   10
2 P1 Yes 10   10
3 P1  No  2   10
4 P2 Yes 20   20
5 P2  No  3   20
6 P2  No  3   20

这里的“技巧”是送到aggregate的data.frame只包含“是”的观察结果。

评论中我的data.table答案的更强大版本是将.(...)替换为c(.SD, mean=...),如下所示:

library(data.table)
setDT(DF)[, c(.SD, mean=mean(C[B=="Yes"])), by=A]
    A   B  C mean
1: P1 Yes 10   10
2: P1 Yes 10   10
3: P1  No  2   10
4: P2 Yes 20   20
5: P2  No  3   20
6: P2  No  3   20

此替换将允许传递任何其他变量。