我有以下数据框:
df <- structure(list(individual = structure(c(1L, 2L, 2L, 2L, 3L, 3L, 3L, 3L),
.Label = c("ind.1", "ind.2", "ind.3"), class = "factor"),
trait = structure(c(1L, 2L, 2L, 2L, 3L, 3L, 3L, 3L),
.Label = c("blue", "green", "yellow"), class = "factor"),
year = structure(c(2L, 1L, 2L, 2L, 1L, 2L, 2L, 1L),
.Label = c("1", "2"), class = "factor"),
flag.1 = structure(c(2L, 1L, 1L, 1L, 2L, 2L, 2L, 2L),
.Label = c("0", "1"), class = "factor"),
flag.2 = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L),
.Label = c("0", "1"), class = "factor"),
quantity = c(10L, 13L, 43L, 19L, 3L, 10L, 4L, 6L)),
row.names = c(NA, -8L),
class = "data.frame")
> df
individual trait year flag.1 flag.2 quantity
1 ind.1 blue 2 1 0 10
2 ind.2 green 1 0 0 13
3 ind.2 green 2 0 0 43
4 ind.2 green 2 0 0 19
5 ind.3 yellow 1 1 1 3
6 ind.3 yellow 2 1 1 10
7 ind.3 yellow 2 1 1 4
8 ind.3 yellow 1 1 1 6
我尝试使用包dplyr
来汇总数据,以便最终得到以下数据框:
individual trait flag.1 flag.2 sum.quantity.year.1 sum.quantity.year.2
1 ind.1 blue 1 0 0 10
2 ind.2 green 0 0 13 62
3 ind.3 yellow 1 1 9 14
其中sum.quantity.year.1
是该个人的数量列的总和,其中year == 1
,同样是sum.quantity.year.2
,是该个人的数量列的总和,其中year == 2
。
我已经尝试使用group_by()
,mutate()
,summarise()
和transmute()
的各种管道组合无济于事。一个应该如何处理呢?
答案 0 :(得分:1)
按summarise
和individual
分组后,您可以使用treat
library(dplyr)
df %>%
group_by(individual, trait) %>%
summarise(flag.1 = first(flag.1),
flag.2 = first(flag.2),
quantity.year.1 = sum(quantity[year == 1]),
quantity.year.2 = sum(quantity[year == 2]))
# individual trait flag.1 flag.2 quantity.year.1 quantity.year.2
# <fct> <fct> <fct> <fct> <int> <int>
#1 ind.1 blue 1 0 0 10
#2 ind.2 green 0 0 13 62
#3 ind.3 yellow 1 1 9 14
但是,如果您有许多这样的quantity
,那么更好的选择是将gather
转换为长格式,然后按组然后spread
进行计算。
library(dplyr)
library(tidyr)
df %>%
gather(key, value, quantity) %>%
group_by(individual, trait, year) %>%
summarise(sum = sum(value)) %>%
mutate(year = paste0("sum_quantity", year)) %>%
spread(year, sum, fill = 0)