我想在d2数据框中添加几个列,这些列来自"结果"向量中" d"数据框。有一个简单的方法吗?
这是" d"数据框。请注意,结果列是每个GROUP&的平均值。标志组合。即,当flag = 0时,3是A组的平均值
d = data.frame(x=c(seq(1,5,1),seq(11,15,1),100,1000),group= c(rep("A",5),rep("B",5),"A","B"))
d = d%>%
group_by(group) %>%
mutate(
U=quantile(x, 0.75) + 1.5*IQR(x),
L=quantile(x, 0.25) - 1.5*IQR(x),
flag = ifelse(x>U | x<L,1,0),
mu = mean(x)
) %>%
group_by(group, flag) %>%
mutate(result = mean(x))
as.data.frame(d)
x group U L flag mu result
1 1 A 8.5 -1.5 0 19.16667 3
2 2 A 8.5 -1.5 0 19.16667 3
3 3 A 8.5 -1.5 0 19.16667 3
4 4 A 8.5 -1.5 0 19.16667 3
5 5 A 8.5 -1.5 0 19.16667 3
6 11 B 18.5 8.5 0 177.50000 13
7 12 B 18.5 8.5 0 177.50000 13
8 13 B 18.5 8.5 0 177.50000 13
9 14 B 18.5 8.5 0 177.50000 13
10 15 B 18.5 8.5 0 177.50000 13
11 100 A 8.5 -1.5 1 19.16667 100
12 1000 B 18.5 8.5 1 177.50000 1000
现在我想要一个汇总表,我在其中显示已经出现在&#34; mu&#34;列,但我想添加2个列&#34; mu_1&#34;和&#34; mu_0&#34;我在下面手动添加。有没有一种有效的方法呢?
谢谢。
d2 = d %>% group_by(group) %>%
summarise(U = mean(U),
L = mean (L),
mu= mean(mu)
)
as.data.frame(d2)
group U L mu mu_1 mu_0
1 A 8.5 -1.5 19.16667 100 3
2 B 18.5 8.5 177.50000 1000 13
答案 0 :(得分:3)
你可以这样做:
d %>%
group_by(group) %>%
summarise(U = mean(U), L = mean(L), mu = mean(mu),
mu_1 = mean(result[flag == 1]),
mu_0 = mean(result[flag == 0]))
给出了:
## A tibble: 2 x 6
# group U L mu mu_1 mu_0
# <fctr> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 A 8.5 -1.5 19.16667 100 3
#2 B 18.5 8.5 177.50000 1000 13
或者您可以执行两个不同的摘要(一个按group
和flag
分组,另一个按group
分组)和left_join()
它们在一起:
library(dplyr)
library(tidyr)
d %>%
group_by(group, flag) %>%
summarise(mean = mean(result)) %>%
spread(flag, mean, sep = "-mu") %>%
left_join(d %>%
group_by(group) %>%
summarise_each(funs(mean), U, L, mu), .)
给出了:
## A tibble: 2 x 6
# group U L mu flag-mu0 flag-mu1
# <fctr> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 A 8.5 -1.5 19.16667 3 100
#2 B 18.5 8.5 177.50000 13 1000