我试图将滞后变量列添加到我的数据框中。我遇到了麻烦,因为我有几个小组(我的例子中的国家/地区),我想这样做。
library(tidyverse)
df <- tribble(
~year, ~country, ~variable,
#--|--|----
1997, "USA", 28,
1998, "USA", 40,
1999, "USA", 30,
2000, "USA", 39,
2001, "USA", 55,
2002, "USA", 53,
2003, "USA", 64,
2004, "USA", 40,
2005, "USA", 30,
2006, "USA", 39,
2007, "USA", 55,
2008, "USA", 53,
2009, "USA", 71,
2010, "USA", 44,
2011, "USA", 40,
2012, "USA", 17,
2013, "USA", 39,
2014, "USA", 55,
2015, "USA", 53,
1997, "France", 13,
1998, "France", 42,
1999, "France", 37,
2000, "France", 11,
2001, "France", 55,
2002, "France", 53,
2003, "France", 31,
2004, "France", 10,
2005, "France", 30,
2006, "France", 37,
2007, "France", 54,
2008, "France", 58,
2009, "France", 50,
2010, "France", 40,
2011, "France", 49,
2012, "France", 14,
2013, "France", 34,
2014, "France", 53,
2015, "France", 50
)
nlags <- 1:10
df_lags <- map(.x = nlags,
.f = ~ lag(df$variable, .x)) %>%
as.data.frame
names(df_lags) <- paste0("lag_", nlags)
df <- df %>%
bind_cols(df_lags)
这大概是正确的,但是滞后它也会跨群体延迟!所以,之后,第20行看起来像这样:
---------------------------------
| 1997 | France | 13 | 53 | ... |
---------------------------------
但是53
取自USA
组,而它应该只是NA
。
我试过这个:
df %>%
group_by(country) %>%
map(.x = nlags,
.f = ~ lag(variable, .x))
但这不起作用:
Error in lag(variable, .x) : object 'variable' not found
有什么想法吗?
答案 0 :(得分:3)
使用data.table
library(data.table)
setDT(df)[, paste0("lag_", nlags) := shift(variable, nlags), country]
答案 1 :(得分:2)
这可能很有用。我们可以按country
拆分数据框,对每个country
执行相同的操作,然后合并结果。 df2
是最终输出。
library(tidyverse)
nlags <- 1:10
df2 <- df %>%
split(.$country) %>%
map_dfr(function(df){
df_lags <- map(nlags, ~lag(df$variable, .x)) %>%
as.data.frame() %>%
setNames(paste0("lag_", nlags))
df <- bind_cols(df, df_lags)
})