如何在year_month面板数据中生成滞后变量

时间:2016-07-24 01:07:06

标签: r panel lag zoo

我想为每月的面板数据创建滞后变量。我的示例数据集如下所示。

Time    ID  Value1
Jan-14  A     12
Feb-14  A     14
Mar-14  A     15
Apr-14  A     18
May-14  A     10
Jun-14  A     12
Jul-14  A     13
Aug-14  A     14
Jan-14  B     32
Feb-14  B     14
Mar-14  B     15
Apr-14  B     18
May-14  B     20
Jun-14  B     12
Jul-14  B     13
Aug-14  B     14

我想为每个ID创建2个新的lag-1和lag-1变量。我做了类似下面的事情,但它仅适用于ID:A和其余的ID需要滞后以前的ID。

library(zoo)
library(plm)
df["Time1"] <- as.yearmon(df$Time,format="%b-%y")
pdf1 <-pdata.frame(df,index=c("ID","Time1"))
pdf1$lag1_Value1<-lag((pdf1$Value1), 1)

但我理想情况下需要我的输出如下。

Time    ID   Value1   lag1.Value1  lag2_Value1
Jan-14  A     12          NA          NA
Feb-14  A     14          12          NA
Mar-14  A     15          14          12
Apr-14  A     18          15          14
May-14  A     10          18          15
Jun-14  A     12          10          18
Jul-14  A     13          12          10
Aug-14  A     14          13          12
Jan-14  B     32          NA          NA
Feb-14  B     14          32          NA
Mar-14  B     15          14          32
Apr-14  B     18          15          14
May-14  B     20          18          15
Jun-14  B     12          20          18
Jul-14  B     13          12          20
Aug-14  B     14          13          12

我的示例数据集在以下代码中给出。

df=structure(list(Time = structure(c(4L, 3L, 7L, 1L, 8L, 6L, 5L, 
2L, 4L, 3L, 7L, 1L, 8L, 6L, 5L, 2L), .Label = c("Apr-14", "Aug-14", 
"Feb-14", "Jan-14", "Jul-14", "Jun-14", "Mar-14", "May-14"), class ="factor"), 
ID = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L), .Label = c("A", "B"), class = "factor"), 
Value1 = c(12L, 14L, 15L, 18L, 10L, 12L, 13L, 14L, 32L, 14L, 
15L, 18L, 20L, 12L, 13L, 14L), Time1 = structure(c(2014, 
2014.08333333333, 2014.16666666667, 2014.25, 2014.33333333333, 
2014.41666666667, 2014.5, 2014.58333333333, 2014, 2014.08333333333, 
2014.16666666667, 2014.25, 2014.33333333333, 2014.41666666667, 
2014.5, 2014.58333333333), class = "yearmon")), row.names = c(NA, 
-16L), .Names = c("Time", "ID", "Value1", "Time1"), class = "data.frame") 

感谢您的帮助。

2 个答案:

答案 0 :(得分:1)

使用dplyr

library(dplyr)
df %>%  group_by(ID) %>%
    mutate(lag1.Value1 = lag(Value1), lag2_Value1 = lag(lag1.Value1))

<强>输出

Source: local data frame [16 x 6]
Groups: ID [2]

#     Time     ID Value1    Time1 lag1.Value1 lag2_Value1
#   <fctr> <fctr>  <int>    <dbl>       <int>       <int>
#1  Jan-14      A     12 2014.000          NA          NA
#2  Feb-14      A     14 2014.083          12          NA
#3  Mar-14      A     15 2014.167          14          12
#4  Apr-14      A     18 2014.250          15          14
#5  May-14      A     10 2014.333          18          15
#6  Jun-14      A     12 2014.417          10          18
#7  Jul-14      A     13 2014.500          12          10
#8  Aug-14      A     14 2014.583          13          12
#9  Jan-14      B     32 2014.000          NA          NA
#10 Feb-14      B     14 2014.083          32          NA
#11 Mar-14      B     15 2014.167          14          32
#12 Apr-14      B     18 2014.250          15          14
#13 May-14      B     20 2014.333          18          15
#14 Jun-14      B     12 2014.417          20          18
#15 Jul-14      B     13 2014.500          12          20
#16 Aug-14      B     14 2014.583          13          12

使用data.table

library(data.table)

setDT(df)[, lag1.Value1:= shift(Value1), ID][, lag2_Value1 := shift(lag1.Value1), ID]

答案 1 :(得分:0)

使用data.table,我们可以在一个步骤中执行此操作,因为shift采用“n”序列

library(data.table)
setDT(df)[, paste0("lag", 1:2, "_Value1") := shift(Value1, 1:2), by = ID]
df
#      Time ID Value1    Time1 lag1_Value1 lag2_Value1
# 1: Jan-14  A     12 2014.000          NA          NA
# 2: Feb-14  A     14 2014.083          12          NA
# 3: Mar-14  A     15 2014.167          14          12
# 4: Apr-14  A     18 2014.250          15          14
# 5: May-14  A     10 2014.333          18          15
# 6: Jun-14  A     12 2014.417          10          18
# 7: Jul-14  A     13 2014.500          12          10
# 8: Aug-14  A     14 2014.583          13          12
# 9: Jan-14  B     32 2014.000          NA          NA
#10: Feb-14  B     14 2014.083          32          NA
#11: Mar-14  B     15 2014.167          14          32
#12: Apr-14  B     18 2014.250          15          14
#13: May-14  B     20 2014.333          18          15
#14: Jun-14  B     12 2014.417          20          18
#15: Jul-14  B     13 2014.500          12          20
#16: Aug-14  B     14 2014.583          13          12