R中的SumIfs - 从多个条件创建子集并对特定列进行求和

时间:2017-11-08 05:50:44

标签: r sum subset sumifs

我有一组类似于以下的面板数据:

city <- c("ARI", "ATL", "BAL", "BUF", "CAR", "ARI", "ATL", "BAL", "BUF", "CAR", "ARI", "ATL", "BAL", "BUF", "CAR", "ARI", "ATL", "BAL", "BUF", "CAR", "ARI", "ATL", "BAL", "BUF", "CAR")
week <- c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5)
df <- as.data.frame(cbind(city, week))
df$week <- as.numeric(df$week)
df$x <- c(6, 3, 9, 12, 4, 3, 7, 8, 2, 12, 15, 6, 3, 9, 0, 14, 18, 2, 21, 15, 17, 9, 10, 1, 22)

我想创建一个新变量df$y,它为每个城市和每周在当前观察的一周之前总计df$x。因此,例如,df$y[25]应该等于31,因为sum(df[df$city == "CAR" & df$week < 5, 3])等于31。

我的问题是,如何在函数中自动编写?

对每个团队和周组合使用sum(df[df$city == "CAR" & df$week < 5, 3])将是乏味的。我的自然倾向是写df$y <- sum(df[df$city == df$city & df$week < df$week, 3])之类的东西,但这没有意义。我是R的新手,并不完全了解功能;但是,这是我尝试做的最佳途径吗?

感谢您的帮助!

2 个答案:

答案 0 :(得分:0)

dplyr

的一个选项
library(dplyr)
res <- df %>% 
         group_by(city) %>% 
         mutate(y = cumsum(lag(x, default = 0)))
res[25,]
# A tibble: 1 x 4
# Groups:   city [1]
#    city  week     x     y
#   <fctr> <dbl> <dbl> <dbl>
#1    CAR     5    22    31

答案 1 :(得分:0)

data.table

的一个选项
setDT(df)[, y := c(0, cumsum(x[-length(x)])), by = 'city']
df