我有一组类似于以下的面板数据:
city <- c("ARI", "ATL", "BAL", "BUF", "CAR", "ARI", "ATL", "BAL", "BUF", "CAR", "ARI", "ATL", "BAL", "BUF", "CAR", "ARI", "ATL", "BAL", "BUF", "CAR", "ARI", "ATL", "BAL", "BUF", "CAR")
week <- c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5)
df <- as.data.frame(cbind(city, week))
df$week <- as.numeric(df$week)
df$x <- c(6, 3, 9, 12, 4, 3, 7, 8, 2, 12, 15, 6, 3, 9, 0, 14, 18, 2, 21, 15, 17, 9, 10, 1, 22)
我想创建一个新变量df$y
,它为每个城市和每周在当前观察的一周之前总计df$x
。因此,例如,df$y[25]
应该等于31
,因为sum(df[df$city == "CAR" & df$week < 5, 3])
等于31。
我的问题是,如何在函数中自动编写?
对每个团队和周组合使用sum(df[df$city == "CAR" & df$week < 5, 3])
将是乏味的。我的自然倾向是写df$y <- sum(df[df$city == df$city & df$week < df$week, 3])
之类的东西,但这没有意义。我是R的新手,并不完全了解功能;但是,这是我尝试做的最佳途径吗?
感谢您的帮助!
答案 0 :(得分:0)
dplyr
library(dplyr)
res <- df %>%
group_by(city) %>%
mutate(y = cumsum(lag(x, default = 0)))
res[25,]
# A tibble: 1 x 4
# Groups: city [1]
# city week x y
# <fctr> <dbl> <dbl> <dbl>
#1 CAR 5 22 31
答案 1 :(得分:0)
data.table
setDT(df)[, y := c(0, cumsum(x[-length(x)])), by = 'city']
df