我正在处理的数据集如下所示:
> head(Data,30)
str_num wdate Variable
1 3 2010-01-31 0.00000
2 3 2010-02-07 0.00000
3 3 2010-02-14 0.00000
4 3 2010-02-21 33772.53865
5 3 2010-02-28 34163.28261
6 3 2010-03-07 54333.65217
7 3 2010-03-14 0.00000
8 3 2010-03-21 0.00000
9 3 2010-03-28 0.00000
10 3 2010-04-04 167099.80640
11 3 2010-04-11 168832.12420
12 3 2010-04-18 164872.31820
13 3 2010-04-25 17.04348
14 3 2010-05-02 17.04348
15 3 2010-05-09 168916.17500
16 4 2010-01-31 0.00000
17 4 2010-02-07 0.00000
18 4 2010-02-14 0.00000
19 4 2010-02-21 33772.53865
20 4 2010-02-28 34163.28261
21 4 2010-03-07 37786.68358
22 4 2010-03-14 0.00000
23 4 2010-03-21 0.00000
24 4 2010-03-28 0.00000
25 4 2010-04-04 167099.80640
26 4 2010-04-11 0.00000
27 4 2010-04-18 164872.31820
28 4 2010-04-25 0.00000
29 4 2010-05-02 0.00000
30 4 2010-05-09 168916.17500
面板由str_num定义。现在我想对逻辑后面的列变量应用adstock转换:
y1=x1^Power
y2=(x2+y1*Adstock)^Power
y3=(x3+y2*Adstock)^Power
.......
等等,其中x是初始变量,y是变换后的变量。 为了使用plm实现这一点,我使用以下代码:
test_data<-pdata.frame(Data, c("str_num","wdate"))
ind<-"Variable"
ad<-10
pwr<-60
lg<-1
ap <- stats::filter(test_data[,ind], ad/100, method="recursive")^(pwr/100)
test_data[paste(ind,"_",ad,".",pwr,".",lg,sep="")]<-lag(ap,lg)
test_data[,paste(ind,"_",ad,".",pwr,".",lg,sep="")]<- ifelse(is.na(test_data[,paste(ind,"_",ad,".",pwr,".",lg,sep="")]),test_data[,paste(ind,"_",ad,".",pwr,".",lg,sep="")][1],test_data[,paste(ind,"_",ad,".",pwr,".",lg,sep="")])
在这方面,我面临两个问题
> head(test_data,30)
str_num wdate Variable Variable_10.60.1
3-2010-01-31 3 2010-01-31 0.00000 0.00000
3-2010-02-07 3 2010-02-07 0.00000 0.00000
3-2010-02-14 3 2010-02-14 0.00000 0.00000
3-2010-02-21 3 2010-02-21 33772.53865 521.36062
3-2010-02-28 3 2010-02-28 34163.28261 555.52075
3-2010-03-07 3 2010-03-07 54333.65217 721.85594
3-2010-03-14 3 2010-03-14 0.00000 181.32201
3-2010-03-21 3 2010-03-21 0.00000 45.54603
3-2010-03-28 3 2010-03-28 0.00000 11.44065
3-2010-04-04 3 2010-04-04 167099.80640 1360.80102
3-2010-04-11 3 2010-04-11 168832.12420 1448.99305
3-2010-04-18 3 2010-04-18 164872.31820 1439.05493
3-2010-04-25 3 2010-04-25 17.04348 361.67574
3-2010-05-02 3 2010-05-02 17.04348 91.35392
3-2010-05-09 3 2010-05-09 168916.17500 1370.52966
4-2010-01-31 4 2010-01-31 0.00000 344.26149
4-2010-02-07 4 2010-02-07 0.00000 86.47458
4-2010-02-14 4 2010-02-14 0.00000 21.72143
4-2010-02-21 4 2010-02-21 33772.53865 521.51723
4-2010-02-28 4 2010-02-28 34163.28261 555.53576
4-2010-03-07 4 2010-03-07 37786.68358 590.31739
4-2010-03-14 4 2010-03-14 0.00000 148.28103
4-2010-03-21 4 2010-03-21 0.00000 37.24651
4-2010-03-28 4 2010-03-28 0.00000 9.35590
4-2010-04-04 4 2010-04-04 167099.80640 1360.79294
4-2010-04-11 4 2010-04-11 0.00000 341.81573
4-2010-04-18 4 2010-04-18 164872.31820 1358.05197
4-2010-04-25 4 2010-04-25 0.00000 341.12723
4-2010-05-02 4 2010-05-02 0.00000 85.68729
4-2010-05-09 4 2010-05-09 168916.17500 1370.43844
对于面板4的前三个日期,变量值为0,但变换后的值不为0.这是因为plm将前三个日期视为前一个面板的一部分而4正在考虑为仅当非零值开始时,新面板。我该如何克服这个问题?
当我应用滞后时,它只会应用于第一个数据点而不是其他地方。但是,如果存在一个或多个滞后,我希望将NA替换为面板的第一个值。我怎样才能做到这一点?
非常感谢任何帮助
答案 0 :(得分:2)
虽然不是最优雅的解决方案,但这可以解决您的问题。
data <- read.csv("apl.csv", header = T)
library(dplyr)
ad<-10
pwr<-60/100
lg<-1
data <- data %>%
group_by(str_num) %>%
do(., mutate(., ap = stats::filter(.$Variable, ad/100, method = "recursive")^pwr)) %>%
do(., mutate(., lag1 = ifelse(is.na(stats::lag(.$ap, lg)), .$ap[1], stats::lag(.$ap, lg))))