在plm()函数中选择的未定义列

时间:2019-11-24 23:12:56

标签: r

我在plm()函数中遇到了一个奇怪的问题。下面是代码:

df['name'] = df['name'].bfill()
  

library(data.table) library(tidyverse) library(plm) #Data Generation n <- 500 set.seed(75080) z <- rnorm(n) w <- rnorm(n) x <- 5*z + 50 y <- -100*z+ 1100 + 50*w y <- 10*round(y/10) y <- ifelse(y<200,200,y) y <- ifelse(y>1600,1600,y) dt1 <- data.table('id'=1:500,'sat'=y,'income'=x,'group'=rep(1,n)) z <- rnorm(n) w <- rnorm(n) x <- 5*z + 80 y <- -80*z+ 1200 + 50*w y <- 10*round(y/10) y <- ifelse(y<200,200,y) y <- ifelse(y>1600,1600,y) dt2 <- data.table('id'=501:1000,'sat'=y,'income'=x,'group'=rep(2,n)) z <- rnorm(n) w <- rnorm(n) x <- 5*z + 30 y <- -120*z+ 1000 + 50*w y <- 10*round(y/10) y <- ifelse(y<200,200,y) y <- ifelse(y>1600,1600,y) dt3 <- data.table('id'=1001:1500,'sat'=y,'income'=x,'group'=rep(3,n)) dtable <- merge(dt1 ,dt2, all=TRUE) dtable <- merge(dtable ,dt3, all=TRUE) # Model dtable_p <- pdata.frame(dtable, index = "group") mod_1 <- plm(sat ~ income, data = dtable_p,model = "pooling") (x,其中)中的错误:未定义的列已选中

我检查了所有可能性,但我不知道为什么它给我一个错误。列的名称是正确的,为什么R表示未定义的列?谢谢!

跟进:我添加了另一个数据集测试,作为@StupidWolf用来证明

[.data.frame

1 个答案:

答案 0 :(得分:1)

这非常奇怪,答案是索引不能命名为“ group”。

我怀疑在plm函数中的某处,它一定是在您的data.frame中添加了一个“组”。

我们可以使用示例数据集

data("Produc", package = "plm")
form <- log(gsp) ~ log(pc) 
Produc$group = Produc$region
pProduc <- pdata.frame(Produc, index = c("group"))
summary(plm(form, data = pProduc, model = "random"))
Error in `[.data.frame`(x, , which) : undefined columns selected

使用我从中复制的“区域”列,它可以正常工作:

pProduc <- pdata.frame(Produc, index = c("region"))
summary(plm(form, data = pProduc, model = "random"))

Oneway (individual) effect Random Effect Model 
   (Swamy-Arora's transformation)

Call:
plm(formula = form, data = pProduc, model = "random")

Unbalanced Panel: n = 9, T = 51-136, N = 816

Effects:
                  var std.dev share
idiosyncratic 0.03691 0.19213 0.402
individual    0.05502 0.23457 0.598
theta:
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.8861  0.9012  0.9192  0.9157  0.9299  0.9299 

Residuals:
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
-0.68180 -0.11014  0.00977 -0.00039  0.13815  0.45491 

Coefficients:
             Estimate Std. Error  z-value  Pr(>|z|)    
(Intercept) -1.099088   0.138395  -7.9417 1.994e-15 ***
log(pc)      1.100102   0.010623 103.5627 < 2.2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Total Sum of Squares:    459.71
Residual Sum of Squares: 30.029
R-Squared:      0.93468
Adj. R-Squared: 0.9346
Chisq: 11647.6 on 1 DF, p-value: < 2.22e-16

对于您的示例,只需重命名“组”列,并将其设置为避免其他错误的一个因素。 (对于“合并”,应将其视为非数字分类。)

dtable <- merge(dt1    ,dt2, all=TRUE)
dtable <- merge(dtable ,dt3, all=TRUE)
dtable$group = factor(dtable$group)
colnames(dtable)[4] = "GROUP"
dtable_p <- pdata.frame(dtable, index = "GROUP")
summary(plm(sat ~ income, data = dtable_p,method="pooling"))