Question

使用此数据输入：

SELECT
    UNPVTBL.Term
    , UNPVTBL.Comp
    , UNPVTBL.Score
FROM Table
UNPIVOT
(Score FOR Comp IN (Listening, Reading, Speaking, Writing) ) UNPVTBL
ORDER BY UNPVTBL.Term

和这段代码：

A   B   C   D
0.0513748973337 0.442624990365  0.044669941640565   12023787.0495
-0.047511808790502  0.199057057555  0.067542653775225   6674747.75598
0.250333519823608   0.0400359422093 -0.062361320324768  10836244.44
0.033600922318947   0.118359141703  0.048493523722074   7521473.94034
0.00492552770819    0.0851342003243 0.027123088894137   8742685.39098
0.02053037069955    0.0535545969759 0.06352586720282    8442677.4204
0.09050961131549    0.044871795257  0.049363888991624   7223126.70424
0.082789930841618   0.0230375009412 0.090676778601245   8974611.5623
0.06396481119371    0.0467280364963 0.128097065131764   8167179.81463

第二个library(plm); mydata <- read.csv("reproduce_small.csv", sep = "\t"); plm(C ~ log(D), data = mydata, model = "pooling"); # works plm(A ~ log(B), data = mydata, model = "pooling"); # error调用返回以下错误：

plm

Error in Math.factor(B) : ‘log’ not meaningful for factors包含上面粘贴的十行数据。显然，reproduce_small.csv不是一个因素，它显然是一个数字向量。这意味着B认为这是一个因素。问题是“为什么？”，但更重要的是“我该如何解决这个问题？”

我尝试过的事情：

＃1）plm导致

mydata$B.log <- log(mydata$B)

这本身很奇怪，因为A和B.log的长度明显相同。

＃2）Error in model.frame.default(formula = y ~ X - 1, drop.unused.levels = TRUE) : variable lengths differ (found for 'X')导致与＃1相同的错误。

＃3）plm(A ~ log(D), data = mydata, model = "pooling");导致相同的原始错误（日志对因素没有意义）。

＃4）plm(C ~ log(B), data = mydata, model = "pooling");导致

plm(A ~ log(B + 1), data = mydata, model = "pooling");

＃5）Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : contrasts can be applied only to factors with 2 or more levels In addition: Warning message: In Ops.factor(B, 1) : ‘+’ not meaningful for factors导致相同的原始错误（日志对因子没有意义）。

编辑：正如所建议的那样，我包括了plm(A ~ as.numeric(as.character(log(B))), data = mydata, model = "pooling");：

的结果

str(mydata)

同时尝试> str(mydata) 'data.frame': 9 obs. of 4 variables: $ A: num 0.05137 -0.04751 0.25033 0.0336 0.00493 ... $ B: num 0.4426 0.1991 0.04 0.1184 0.0851 ... $ C: num 0.0447 0.0675 -0.0624 0.0485 0.0271 ... $ D: num 12023787 6674748 10836244 7521474 8742685 ...也无效。

Answer 1

评论中的Helix123指出integer primary key应该转换为data.frame。因此，例如，这个玩具示例的解决方案将是：

pdata.frame

编辑：至于＆＃34;为什么＆＃34;正如Helix123在评论中指出的那样，当发送mydata$E <- c("x", "x", "x", "x", "x", "y", "y", "y", "y"); # Create E as an "index" mydata <- pdata.frame(mydata, index = "E"); # convert to pdata.frame plm(A ~ log(B), data = mydata, model = "pooling"); # now it works!而不是data.frame时，pdata.frame悄然假设前两列是索引，并将它们转换为因子引擎盖下。然后plm会抛出一个无用的错误，而不是发出一个警告，告知传递的对象的类型不正确，或者根本没有做出假设。

R plm认为我的数字向量是一个因素，为什么？

1 个答案: