使用dplyr聚合数据,并根据其他列中的值有条件地聚合列值

时间:2019-07-19 10:02:14

标签: r dplyr

我有以下数据框:

df <- structure(list(individual = structure(c(1L, 2L, 2L, 2L, 3L, 3L, 3L, 3L), 
      .Label = c("ind.1", "ind.2", "ind.3"), class = "factor"),
    trait = structure(c(1L, 2L, 2L, 2L, 3L, 3L, 3L, 3L), 
      .Label = c("blue", "green", "yellow"), class = "factor"), 
    year = structure(c(2L, 1L, 2L, 2L, 1L, 2L, 2L, 1L), 
      .Label = c("1", "2"), class = "factor"), 
    flag.1 = structure(c(2L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), 
      .Label = c("0", "1"), class = "factor"), 
    flag.2 = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), 
      .Label = c("0", "1"), class = "factor"),
    quantity = c(10L, 13L, 43L, 19L, 3L, 10L, 4L, 6L)), 
  row.names = c(NA, -8L), 
  class = "data.frame")

> df
  individual  trait year flag.1 flag.2 quantity
1      ind.1   blue    2      1      0       10
2      ind.2  green    1      0      0       13
3      ind.2  green    2      0      0       43
4      ind.2  green    2      0      0       19
5      ind.3 yellow    1      1      1        3
6      ind.3 yellow    2      1      1       10
7      ind.3 yellow    2      1      1        4
8      ind.3 yellow    1      1      1        6

我尝试使用包dplyr来汇总数据,以便最终得到以下数据框:

  individual   trait  flag.1   flag.2   sum.quantity.year.1   sum.quantity.year.2
1      ind.1    blue       1        0                     0                    10    
2      ind.2   green       0        0                    13                    62    
3      ind.3  yellow       1        1                     9                    14    

其中sum.quantity.year.1是该个人的数量列的总和,其中year == 1,同样是sum.quantity.year.2,是该个人的数量列的总和,其中year == 2

我已经尝试使用group_by()mutate()summarise()transmute()的各种管道组合无济于事。一个应该如何处理呢?

1 个答案:

答案 0 :(得分:1)

summariseindividual分组后,您可以使用treat

library(dplyr)
df %>%
   group_by(individual, trait)  %>%
   summarise(flag.1 = first(flag.1), 
             flag.2 = first(flag.2),
             quantity.year.1 = sum(quantity[year == 1]), 
             quantity.year.2 = sum(quantity[year == 2]))

#  individual trait  flag.1 flag.2  quantity.year.1 quantity.year.2
#  <fct>      <fct>  <fct>  <fct>            <int>           <int>
#1 ind.1      blue   1      0                    0              10
#2 ind.2      green  0      0                   13              62
#3 ind.3      yellow 1      1                    9              14

但是,如果您有许多这样的quantity,那么更好的选择是将gather转换为长格式,然后按组然后spread进行计算。

library(dplyr)
library(tidyr)

df %>%
  gather(key, value, quantity) %>%
  group_by(individual, trait, year) %>%
  summarise(sum = sum(value)) %>%
  mutate(year = paste0("sum_quantity", year)) %>%
  spread(year, sum, fill = 0)