错误:使用与nlsw88数据拟合的不同类型指定变量

时间:2017-04-23 11:12:01

标签: r

我使用来自STATA nlsw88的数据并制作模型

h<-read.dta("nlsw88.dta")
h1<-mutate(h,age = log(h$age),wage = log(h$wage))
model2<-lm(data=h1,wage~age+race+married+never_married+grade+collgrad+industry+union+occupation+hours+ttl_exp+tenure+c_city)

当我想预测的时候,R给我写了关于&#34; c_city&#34;的错误。和&#34;从未结婚的因素&#34; +没有这些因素它无法正常工作

nd<-data.frame(age=log(37),married = "married",union = "union",race = "white",grade = 14,never_married = "1"
               ,collgrad = "college grad",industry = "Manufacturing",
               occupation = "Operatives",hours = 48,ttl_exp = 10, tenure = 5,c_city = "0")
predict(model2,nd)

因素看似喜欢

> head(h1)
  idcode      age  race married never_married grade         collgrad south smsa c_city               industry occupation
1      1 3.610918 black  single             0    12 not college grad     0 SMSA      0 Transport/Comm/Utility Operatives
2      2 3.610918 black  single             0    12 not college grad     0 SMSA      1          Manufacturing  Craftsmen
3      3 3.737670 black  single             1    12 not college grad     0 SMSA      1          Manufacturing      Sales
4      4 3.761200 white married             0    17     college grad     0 SMSA      0  Professional Services      Other
5      6 3.737670 white married             0    12 not college grad     0 SMSA      0          Manufacturing Operatives
6      7 3.663562 white married             0    12 not college grad     0 SMSA      0  Professional Services      Sales
     union     wage hours   ttl_exp    tenure
1    union 2.462927    48 10.333334  5.333333
2    union 1.856448    40 13.621795  5.250000
3     <NA> 1.612777    40 17.730770  1.250000
4    union 2.200974    42 13.211537  1.750000
5 nonunion 2.089853    48 17.820513 17.750000
6 nonunion 1.532477    30  7.326923  2.250000

出了什么问题?

1 个答案:

答案 0 :(得分:1)

never_married数据框中的变量类c_cityh1integer

class(h1$never_married)
[1] "integer"    
class(h1$c_city)
[1] "integer"

但在nd数据框中,这些变量的类是factor

 class(nd$never_married)
 [1] "factor"
 class(nd$c_city)
 [1] "factor"

因此,nd的代码应为:

nd <- data.frame(age=log(37), married="married", union="union",
   race="white", grade=14, never_married=1, collgrad="college grad",
   industry="Manufacturing", occupation="Operatives", hours=48,
   ttl_exp=10, tenure=5, c_city=0)

在这些更改后,命令predict会产生以下结果:

predict(model2,nd)
       1 
1.902962