在R中使用plm的原料转化

时间:2015-04-30 08:37:53

标签: r

我正在处理的数据集如下所示:

> head(Data,30)
   str_num      wdate     Variable
1        3 2010-01-31      0.00000
2        3 2010-02-07      0.00000
3        3 2010-02-14      0.00000
4        3 2010-02-21  33772.53865
5        3 2010-02-28  34163.28261
6        3 2010-03-07  54333.65217
7        3 2010-03-14      0.00000
8        3 2010-03-21      0.00000
9        3 2010-03-28      0.00000
10       3 2010-04-04 167099.80640
11       3 2010-04-11 168832.12420
12       3 2010-04-18 164872.31820
13       3 2010-04-25     17.04348
14       3 2010-05-02     17.04348
15       3 2010-05-09 168916.17500
16       4 2010-01-31      0.00000
17       4 2010-02-07      0.00000
18       4 2010-02-14      0.00000
19       4 2010-02-21  33772.53865
20       4 2010-02-28  34163.28261
21       4 2010-03-07  37786.68358
22       4 2010-03-14      0.00000
23       4 2010-03-21      0.00000
24       4 2010-03-28      0.00000
25       4 2010-04-04 167099.80640
26       4 2010-04-11      0.00000
27       4 2010-04-18 164872.31820
28       4 2010-04-25      0.00000
29       4 2010-05-02      0.00000
30       4 2010-05-09 168916.17500

面板由str_num定义。现在我想对逻辑后面的列变量应用adstock转换:

y1=x1^Power
y2=(x2+y1*Adstock)^Power
y3=(x3+y2*Adstock)^Power
.......

等等,其中x是初始变量,y是变换后的变量。 为了使用plm实现这一点,我使用以下代码:

test_data<-pdata.frame(Data, c("str_num","wdate"))
ind<-"Variable"

ad<-10
pwr<-60
lg<-1

ap <- stats::filter(test_data[,ind], ad/100, method="recursive")^(pwr/100)
test_data[paste(ind,"_",ad,".",pwr,".",lg,sep="")]<-lag(ap,lg)
test_data[,paste(ind,"_",ad,".",pwr,".",lg,sep="")]<- ifelse(is.na(test_data[,paste(ind,"_",ad,".",pwr,".",lg,sep="")]),test_data[,paste(ind,"_",ad,".",pwr,".",lg,sep="")][1],test_data[,paste(ind,"_",ad,".",pwr,".",lg,sep="")])

在这方面,我面临两个问题

  > head(test_data,30)
                 str_num      wdate     Variable Variable_10.60.1
    3-2010-01-31       3 2010-01-31      0.00000          0.00000
    3-2010-02-07       3 2010-02-07      0.00000          0.00000
    3-2010-02-14       3 2010-02-14      0.00000          0.00000
    3-2010-02-21       3 2010-02-21  33772.53865        521.36062
    3-2010-02-28       3 2010-02-28  34163.28261        555.52075
    3-2010-03-07       3 2010-03-07  54333.65217        721.85594
    3-2010-03-14       3 2010-03-14      0.00000        181.32201
    3-2010-03-21       3 2010-03-21      0.00000         45.54603
    3-2010-03-28       3 2010-03-28      0.00000         11.44065
    3-2010-04-04       3 2010-04-04 167099.80640       1360.80102
    3-2010-04-11       3 2010-04-11 168832.12420       1448.99305
    3-2010-04-18       3 2010-04-18 164872.31820       1439.05493
    3-2010-04-25       3 2010-04-25     17.04348        361.67574
    3-2010-05-02       3 2010-05-02     17.04348         91.35392
    3-2010-05-09       3 2010-05-09 168916.17500       1370.52966
    4-2010-01-31       4 2010-01-31      0.00000        344.26149
    4-2010-02-07       4 2010-02-07      0.00000         86.47458
    4-2010-02-14       4 2010-02-14      0.00000         21.72143
    4-2010-02-21       4 2010-02-21  33772.53865        521.51723
    4-2010-02-28       4 2010-02-28  34163.28261        555.53576
    4-2010-03-07       4 2010-03-07  37786.68358        590.31739
    4-2010-03-14       4 2010-03-14      0.00000        148.28103
    4-2010-03-21       4 2010-03-21      0.00000         37.24651
    4-2010-03-28       4 2010-03-28      0.00000          9.35590
    4-2010-04-04       4 2010-04-04 167099.80640       1360.79294
    4-2010-04-11       4 2010-04-11      0.00000        341.81573
    4-2010-04-18       4 2010-04-18 164872.31820       1358.05197
    4-2010-04-25       4 2010-04-25      0.00000        341.12723
    4-2010-05-02       4 2010-05-02      0.00000         85.68729
    4-2010-05-09       4 2010-05-09 168916.17500       1370.43844
  1. 对于面板4的前三个日期,变量值为0,但变换后的值不为0.这是因为plm将前三个日期视为前一个面板的一部分而4正在考虑为仅当非零值开始时,新面板。我该如何克服这个问题?

  2. 当我应用滞后时,它只会应用于第一个数据点而不是其他地方。但是,如果存在一个或多个滞后,我希望将NA替换为面板的第一个值。我怎样才能做到这一点?

  3. 非常感谢任何帮助

1 个答案:

答案 0 :(得分:2)

虽然不是最优雅的解决方案,但这可以解决您的问题。

data <- read.csv("apl.csv", header = T)

library(dplyr)

ad<-10
pwr<-60/100
lg<-1

data <- data %>%
  group_by(str_num) %>%
  do(., mutate(., ap = stats::filter(.$Variable, ad/100, method = "recursive")^pwr)) %>%
  do(., mutate(., lag1 = ifelse(is.na(stats::lag(.$ap, lg)), .$ap[1], stats::lag(.$ap, lg))))