在dplyr操作后创建汇总表

时间:2016-07-14 16:46:24

标签: r dplyr

我想在d2数据框中添加几个列,这些列来自"结果"向量中" d"数据框。有一个简单的方法吗?

这是" d"数据框。请注意,结果列是每个GROUP&的平均值。标志组合。即,当flag = 0时,3是A组的平均值

    d = data.frame(x=c(seq(1,5,1),seq(11,15,1),100,1000),group= c(rep("A",5),rep("B",5),"A","B")) 
    d = d%>%
      group_by(group)  %>%  
      mutate(    
                   U=quantile(x, 0.75) + 1.5*IQR(x),
                   L=quantile(x, 0.25) - 1.5*IQR(x),
                   flag = ifelse(x>U | x<L,1,0),
                    mu = mean(x)
                   ) %>%  
  group_by(group, flag) %>%
  mutate(result = mean(x))

    as.data.frame(d)
      x group    U    L flag        mu result
1     1     A  8.5 -1.5    0  19.16667      3
2     2     A  8.5 -1.5    0  19.16667      3
3     3     A  8.5 -1.5    0  19.16667      3
4     4     A  8.5 -1.5    0  19.16667      3
5     5     A  8.5 -1.5    0  19.16667      3
6    11     B 18.5  8.5    0 177.50000     13
7    12     B 18.5  8.5    0 177.50000     13
8    13     B 18.5  8.5    0 177.50000     13
9    14     B 18.5  8.5    0 177.50000     13
10   15     B 18.5  8.5    0 177.50000     13
11  100     A  8.5 -1.5    1  19.16667    100
12 1000     B 18.5  8.5    1 177.50000   1000

现在我想要一个汇总表,我在其中显示已经出现在&#34; mu&#34;列,但我想添加2个列&#34; mu_1&#34;和&#34; mu_0&#34;我在下面手动添加。有没有一种有效的方法呢?

谢谢。

    d2 = d %>% group_by(group) %>% 
         summarise(U = mean(U),
                   L = mean (L),
                   mu= mean(mu)
                   )
         as.data.frame(d2)



 group    U    L        mu   mu_1    mu_0
1     A  8.5 -1.5  19.16667   100     3
2     B 18.5  8.5 177.50000   1000    13

1 个答案:

答案 0 :(得分:3)

你可以这样做:

d %>%
  group_by(group) %>%
  summarise(U = mean(U), L = mean(L), mu = mean(mu), 
            mu_1 = mean(result[flag == 1]), 
            mu_0 = mean(result[flag == 0]))

给出了:

## A tibble: 2 x 6
#   group     U     L        mu  mu_1  mu_0
#  <fctr> <dbl> <dbl>     <dbl> <dbl> <dbl>
#1      A   8.5  -1.5  19.16667   100     3
#2      B  18.5   8.5 177.50000  1000    13

或者您可以执行两个不同的摘要(一个按groupflag分组,另一个按group分组)和left_join()它们在一起:

library(dplyr)
library(tidyr)
d %>%
  group_by(group, flag) %>%
  summarise(mean = mean(result)) %>%
  spread(flag, mean, sep = "-mu") %>%
  left_join(d %>% 
              group_by(group) %>%
              summarise_each(funs(mean), U, L, mu), .)

给出了:

## A tibble: 2 x 6
#   group     U     L        mu flag-mu0 flag-mu1
#  <fctr> <dbl> <dbl>     <dbl>    <dbl>    <dbl>
#1      A   8.5  -1.5  19.16667        3      100
#2      B  18.5   8.5 177.50000       13     1000