我正在尝试组合log-log模型的因变量的拟合值。我的数据集是一个不平衡的面板。我试图按照here所示的方式进行。但我的问题不同,因为我已经将我的大数据集转换为plm对象,并且它是一个日志因变量。
我可以通过以下代码访问我的简单数据集。
dat = structure(list(Time = structure(c(9L, 7L, 15L, 1L, 17L, 13L,
11L, 3L, 23L, 21L, 19L, 5L, 10L, 8L, 16L, 2L, 18L, 14L, 12L,
4L, 24L, 22L, 20L, 6L), .Label = c("Apr-00", "Apr-01", "Aug-00",
"Aug-01", "Dec-00", "Dec-01", "Feb-00", "Feb-01", "Jan-00", "Jan-01",
"Jul-00", "Jul-01", "Jun-00", "Jun-01", "Mar-00", "Mar-01", "May-00",
"May-01", "Nov-00", "Nov-01", "Oct-00", "Oct-01", "Sep-00", "Sep-01"
), class = "factor"), Firm = structure(c(1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L), .Label = c("A", "B"), class = "factor"), Out = c(161521L,
142452L, 365697L, 355789L, 376843L, 258762L, 255447L, 188545L,
213663L, 273209L, 317468L, 238668L, 241286L, 135288L, 363609L,
318472L, 446279L, 390230L, 118945L, 174887L, 183770L, 197832L,
317468L, 238668L), Lab = c(261L, 334L, 156L, 134L, 159L, 119L,
41L, 247L, 251L, 62L, 525L, 217L, 298L, 109L, 7L, NA, 0L, 50L,
143L, 85L, 80L, 214L, 525L, 217L), Cap = c(13L, 15L, 14L, 12L,
15L, 12L, 45L, 75L, NA, 12L, 15L, 16L, 42L, 45L, 24L, 56L, 12L,
12L, 45L, NA, 15L, 12L, 15L, 16L)), .Names = c("Time", "Firm",
"Out", "Lab", "Cap"), class = "data.frame", row.names = c(NA,
-24L))
我的数据集如下所示,缺少预测变量数据。
+--------+------+--------+-----+-----+ | Time | Firm | Out | Lab | Cap | +--------+------+--------+-----+-----+ | Jan-00 | A | 161521 | 261 | 13 | | Feb-00 | A | 142452 | 334 | 15 | | Mar-00 | A | 365697 | 156 | 14 | | Apr-00 | A | 355789 | 134 | 12 | | May-00 | A | 376843 | 159 | 15 | | Jun-00 | A | 258762 | 119 | 12 | | Jul-00 | A | 255447 | 41 | 45 | | Aug-00 | A | 188545 | 247 | 75 | | Sep-00 | A | 213663 | 251 | | | Oct-00 | A | 273209 | 62 | 12 | | Nov-00 | A | 317468 | 525 | 15 | | Dec-00 | A | 238668 | 217 | 16 | | Jan-01 | B | 241286 | 298 | 42 | | Feb-01 | B | 135288 | 109 | 45 | | Mar-01 | B | 363609 | 7 | 24 | | Apr-01 | B | 318472 | | 56 | | May-01 | B | 446279 | 0 | 12 | | Jun-01 | B | 390230 | 50 | 12 | | Jul-01 | B | 118945 | 143 | 45 | | Aug-01 | B | 174887 | 85 | | | Sep-01 | B | 183770 | 80 | 15 | | Oct-01 | B | 197832 | 214 | 12 | | Nov-01 | B | 317468 | 525 | 15 | | Dec-01 | B | 238668 | 217 | 16 | +--------+------+--------+-----+-----+
我可以使用以下代码获取拟合值
library(zoo)
library(plm)
Sys.setlocale("LC_TIME", "English")
dat["time1"] <- as.yearmon(dat$Time,format="%b-%y")
pdat <-pdata.frame(dat,index=c("Firm","time1"))
Model1<- plm(log(Out) ~ lag(log(Cap), 1) + log(Lab + 1),
model = "within", data=pdat)
summary(Model1)
library(data.table)
FV_Log <- data.table(Model1$model[[1]] - Model1$residuals)
但是对pdat的观察是24次观察,FV_Log是19次观测,所以我无法将其合并到pdat。我的pdat很大,有几千个观察,我使用代码创建了许多变量。因此,非常感谢任何帮助将适合的值合并到原始pdat中(不改变顺序)。
答案 0 :(得分:1)
你面临的问题是双重的。首先,在计算残差时,你的模型正在下降NA
,所以你的结果自然会有更少的元素。第二个问题是你的一个协变量滞后,这为你的第一次观察产生NA
(首先是暂时的) - 所有这就是滞后所做的。要解决这个问题,您需要前一段时间的其他数据,否则您必须放弃该观察。假设您无法访问上一时间段的数据,我就可以解决此问题。
#First I would create a new variable for CAP and just lag and log that separately, rather than applying the function in the formula of the model itself
pdat$Cap.lag.ln<-lag(log(pdat$Cap), 1)
pdat$Cap<-NULL #deleting the old variable to clear up the mess
#Dont necessarily need the na.omit but it couldn't hurt...
Model1<- plm(log(Out) ~ Cap.lag.ln + log(Lab + 1),
model = "within", data=pdat, na.omit=TRUE)
FV_Log <- data.table(Model1$model[[1]] - Model1$residuals)
#Now this is where you reduce your original dataset (pdat) by getting rid of the NAs
pdat2<-na.omit(pdat)
#You will notice that they're the same dimensions now and you can cbind
pdat3 <-cbind(pdat2,FV_Log)
Time Firm Out Lab time1 Cap.lag.ln V1
1: Feb-00 A 142452 334 Feb 2000 2 12.41211
2: Mar-00 A 365697 156 Mar 2000 2 12.54861
3: Apr-00 A 355789 134 Apr 2000 2 12.57580
4: May-00 A 376843 159 May 2000 2 12.54520
5: Jun-00 A 258762 119 Jun 2000 2 12.59702
6: Jul-00 A 255447 41 Jul 2000 2 12.78611
7: Aug-00 A 188545 247 Aug 2000 3 12.28887
8: Sep-00 A 213663 251 Sep 2000 4 12.10858
9: Nov-00 A 317468 525 Nov 2000 2 12.33084
10: Dec-00 A 238668 217 Dec 2000 2 12.48949
11: Feb-01 B 135288 109 Feb 2001 3 12.20776
12: Mar-01 B 363609 7 Mar 2001 3 12.67984
13: May-01 B 446279 0 May 2001 4 12.87698
14: Jun-01 B 390230 50 Jun 2001 2 12.52360
15: Jul-01 B 118945 143 Jul 2001 2 12.33665
16: Aug-01 B 174887 85 Aug 2001 3 12.25209
17: Oct-01 B 197832 214 Oct 2001 2 12.26445
18: Nov-01 B 317468 525 Nov 2001 2 12.10331
19: Dec-01 B 238668 217 Dec 2001 2 12.26195
如果您想要检索这些NA
,您可以执行以下操作:
pdat3 <-as.data.frame(pdat3)
pdat4<-merge(pdat3, pdat,
by=c("Time","Firm","Out", "Lab","time1"),
all.x=TRUE,all.y=TRUE)
Time Firm Out Lab time1 Cap.lag.ln V1 Cap
1 Apr-00 A 355789 134 Apr 2000 2 12.57580 12
2 Apr-01 B 318472 NA Apr 2001 NA NA 56
3 Aug-00 A 188545 247 Aug 2000 3 12.28887 75
4 Aug-01 B 174887 85 Aug 2001 3 12.25209 NA
5 Dec-00 A 238668 217 Dec 2000 2 12.48949 16
6 Dec-01 B 238668 217 Dec 2001 2 12.26195 16
7 Feb-00 A 142452 334 Feb 2000 2 12.41211 15
8 Feb-01 B 135288 109 Feb 2001 3 12.20776 45
9 Jan-00 A 161521 261 Jan 2000 NA NA 13
10 Jan-01 B 241286 298 Jan 2001 NA NA 42
11 Jul-00 A 255447 41 Jul 2000 2 12.78611 45
12 Jul-01 B 118945 143 Jul 2001 2 12.33665 45
13 Jun-00 A 258762 119 Jun 2000 2 12.59702 12
14 Jun-01 B 390230 50 Jun 2001 2 12.52360 12
15 Mar-00 A 365697 156 Mar 2000 2 12.54861 14
16 Mar-01 B 363609 7 Mar 2001 3 12.67984 24
17 May-00 A 376843 159 May 2000 2 12.54520 15
18 May-01 B 446279 0 May 2001 4 12.87698 12
19 Nov-00 A 317468 525 Nov 2000 2 12.33084 15
20 Nov-01 B 317468 525 Nov 2001 2 12.10331 15
21 Oct-00 A 273209 62 Oct 2000 NA NA 12
22 Oct-01 B 197832 214 Oct 2001 2 12.26445 12
23 Sep-00 A 213663 251 Sep 2000 4 12.10858 NA
24 Sep-01 B 183770 80 Sep 2001 NA NA 15