我有一个数据框,其中包含每年创建的不同新项目的数量:
# Sample data
df = data.frame(n_new = c(1, 1, 2, 4, 5, 3),
type = c("a", "b", "a", "b", "a", "a"),
year = c(2000, 2000, 2001, 2003, 2004, 2005))
df
# n_new type year
# 1 1 a 2000
# 2 1 b 2000
# 3 2 a 2001
# 4 4 b 2003
# 5 5 a 2004
# 6 3 a 2005
由于这些项目在随后的几年中仍然存在,我希望将它们聚合成每种类型的现有项目总数...
# Expected result
df$n_total = c(1, 1, 3, 5, 8, 11)
df
# n_new type year n_total
# 1 1 a 2000 1
# 2 1 b 2000 1
# 3 2 a 2001 3
# 4 4 b 2003 5
# 5 5 a 2004 8
# 6 3 a 2005 11
为此,我尝试将每个值(每种类型)与前一年的值相加...
df$n_total[df$type = "a"] <- df$n_new[df$type = "a"] +
df$n_new[df$type = "a" & df$year - 1]
# It obviously doesn't work ;-)
看起来很明显,然而,我无法找到如何引用year-1
...我可以使用for
循环执行此操作,但我确信{{1}有一些更好的解决方案。我不能把手指放在上面!
答案 0 :(得分:1)
您可以使用dplyr mutate
函数加cumsum
和library(dplyr)
df = data.frame(n_new = c(1, 1, 2, 4, 5, 3),
type = c("a", "b", "a", "b", "a", "a"),
year = c(2000, 2000, 2001, 2003, 2004, 2005))
df$n_total<-cumsum(df$n_new)
df %>%group_by(type) %>% mutate(ntotalbytype = cumsum(n_new))
执行此操作,如下所示:
n_new type year ntotalbytype
<dbl> <fctr> <dbl> <dbl>
1 1 a 2000 1
2 1 b 2000 1
3 2 a 2001 3
4 4 b 2003 5
5 5 a 2004 8
6 3 a 2005 11
结果:
{{1}}