不平衡面板数据滞后12个月

时间:2014-10-07 15:22:40

标签: r dataframe

我正在尝试将Plasma_mean变量的新12个月滞后变量添加到我的面板数据中。 PLasma_mean数据在其他观察开始前12个月开始,因此数据集头中其他变量的NA。

  ProdGrp timeperiod Plasma_mean Mark.Invest_mean Reps_mean repcost_mean Sales_sum     Pcs_vol_sum
  1:               1/1/2003      948881               NA        NA           NA        NA              NA
  2:               2/1/2003      787974               NA        NA           NA        NA          NA
  3:               3/1/2003      872733               NA        NA           NA        NA          NA
  4:               4/1/2003      932405               NA        NA           NA        NA          NA
  5:               5/1/2003      922127               NA        NA           NA        NA          NA
 ---                                                                                                 
155: Product A   4/1/2010     1325862         36362.49      1.33     14436.66  168874.9             718
156: Product B  5/1/2010     1253672         53821.38      8.17     14336.67 1989798.9        4549
157: Product A  5/1/2010     1253672         37146.27      1.33     14436.66  152519.5         596
158: Product B   6/1/2010     1334744         69749.48      8.17     14336.67 1978877.4        4612
159: Product A    6/1/2010     1334744         38093.63      1.33     14436.66  164404.0         689

 gProt_vol_sum pckg_price_mean g_Prot_price_mean TotalpharmaBiosales_mean      dollarized_reps_mean      dates
  1:            NA              NA                NA                       NA                   NA 2003-01-01
  2:            NA              NA                NA                       NA                   NA 2003-02-01
  3:            NA              NA                NA                       NA                   NA 2003-03-01
  4:            NA              NA                NA                       NA                   NA 2003-04-01
  5:            NA              NA                NA                       NA                   NA 2003-05-01
 ---                                                                                                         
   155:        2378.5        191.0250          76.88328                  6023500             19200.76 2010-04-01
   156:       40109.5        288.6149          49.80379                  6135394            30.59 2010-05-01
   157:        2204.0        187.4431          76.11616                  6135394             19200.76 2010-05-01
   158:       41776.0        298.1715          55.74162                  8673498            117130.59 2010-06-01
   159:        2305.5        190.6980          76.77850                  8673498             19200.76 2010-06-01
             plasma_lagged
      1:            NA
      2:            NA
      3:            NA
      4:            NA
      5:            NA
     ---              
    155:            NA
    156:            NA
    157:            NA
    158:            NA
    159:            NA

使用data.frame包,我做了:

lag <- function(Plasma_mean, n = 12L, along_with){
+ index <- match(along_with - n, along_with, incomparable = NA)
+ out <- Plasma_mean[index]
+ attributes(out) <- attributes(Plasma_mean)
+ out
+ }

然后按产品组

将其附加到我的数据集
DT[, plasma_lagged := lag(Plasma_mean, 12, along_with = dates), by = ProdGrp] 

我在我的数据集的最后一列中得到了plasma_lagged变量。但它似乎没有数据。 (观察155和之后的观察结果)。

如何解决这个问题的任何提示都会很棒。

ħ

1 个答案:

答案 0 :(得分:0)

你滞后12天,而不是12个月。试试

library(lubridate)
DT[, plasma_lagged := lag(Plasma_mean, months(12), along_with = dates), by = ProdGrp] 

(请提供可重复示例的代码,否则我无法确保其有效。)