有没有办法将特定组级别的平均值分配给整个组?下面是我尝试做的一个例子 - 我正在使用库(tidyverse)。
假设:
> DF <- data.frame(A = c("P1","P1","P1","P2","P2","P2"), B = c("Yes","Yes","No","Yes","No","No"), C = c(10,10,2,20,3,3))
> DF
A B C
1 P1 Yes 10
2 P1 Yes 10
3 P1 No 2
4 P2 Yes 20
5 P2 No 3
6 P2 No 3
我想创建&#34;意思是&#34;根据平均值B =&#34;是&#34;按&#34; A&#34;分组:
> DF <- data.frame(A = c("P1","P1","P1","P2","P2","P2"), B = c("Yes","Yes","No","Yes","No","No"), C = c(10,10,2,20,3,3))
> DF
A B C mean
1 P1 Yes 10 10
2 P1 Yes 10 10
3 P1 No 2 10
4 P2 Yes 20 20
5 P2 No 3 20
6 P2 No 3 20
以下是我尝试的内容:
> DF %>% group_by(A) %>% mutate(temp = ifelse(B=="Yes", 1, 0), s= sum(temp), mean = sum(C*temp)/s)
# A tibble: 6 x 6
A B C temp s mean
<fctr> <fctr> <dbl> <dbl> <dbl> <dbl>
1 P1 Yes 10 1 2 10
2 P1 Yes 10 1 2 10
3 P1 No 2 0 2 10
4 P2 Yes 20 1 1 20
5 P2 No 3 0 1 20
6 P2 No 3 0 1 20
答案 0 :(得分:1)
基础R中一个相当简单的方法是按组计算所需的均值,然后将这些结果合并到原始data.frame上。
merge(DF, aggregate(cbind(mean=C)~A, data=DF[DF$B=="Yes",], mean), by="A")
A B C mean
1 P1 Yes 10 10
2 P1 Yes 10 10
3 P1 No 2 10
4 P2 Yes 20 20
5 P2 No 3 20
6 P2 No 3 20
这里的“技巧”是送到aggregate
的data.frame只包含“是”的观察结果。
评论中我的data.table
答案的更强大版本是将.(...)
替换为c(.SD, mean=...)
,如下所示:
library(data.table)
setDT(DF)[, c(.SD, mean=mean(C[B=="Yes"])), by=A]
A B C mean
1: P1 Yes 10 10
2: P1 Yes 10 10
3: P1 No 2 10
4: P2 Yes 20 20
5: P2 No 3 20
6: P2 No 3 20
此替换将允许传递任何其他变量。