Question

我有一组类似于以下的面板数据：

city <- c("ARI", "ATL", "BAL", "BUF", "CAR", "ARI", "ATL", "BAL", "BUF", "CAR", "ARI", "ATL", "BAL", "BUF", "CAR", "ARI", "ATL", "BAL", "BUF", "CAR", "ARI", "ATL", "BAL", "BUF", "CAR")
week <- c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5)
df <- as.data.frame(cbind(city, week))
df$week <- as.numeric(df$week)
df$x <- c(6, 3, 9, 12, 4, 3, 7, 8, 2, 12, 15, 6, 3, 9, 0, 14, 18, 2, 21, 15, 17, 9, 10, 1, 22)

我想创建一个新变量df$y，它为每个城市和每周在当前观察的一周之前总计df$x。因此，例如，df$y[25]应该等于31，因为sum(df[df$city == "CAR" & df$week < 5, 3])等于31。

我的问题是，如何在函数中自动编写？

对每个团队和周组合使用sum(df[df$city == "CAR" & df$week < 5, 3])将是乏味的。我的自然倾向是写df$y <- sum(df[df$city == df$city & df$week < df$week, 3])之类的东西，但这没有意义。我是R的新手，并不完全了解功能;但是，这是我尝试做的最佳途径吗？

感谢您的帮助！

Answer 1

dplyr

的一个选项

library(dplyr)
res <- df %>% 
         group_by(city) %>% 
         mutate(y = cumsum(lag(x, default = 0)))
res[25,]
# A tibble: 1 x 4
# Groups:   city [1]
#    city  week     x     y
#   <fctr> <dbl> <dbl> <dbl>
#1    CAR     5    22    31

Answer 2

data.table

的一个选项

setDT(df)[, y := c(0, cumsum(x[-length(x)])), by = 'city']
df

R中的SumIfs - 从多个条件创建子集并对特定列进行求和

2 个答案: