我使用来自STATA nlsw88的数据并制作模型
h<-read.dta("nlsw88.dta")
h1<-mutate(h,age = log(h$age),wage = log(h$wage))
model2<-lm(data=h1,wage~age+race+married+never_married+grade+collgrad+industry+union+occupation+hours+ttl_exp+tenure+c_city)
当我想预测的时候,R给我写了关于&#34; c_city&#34;的错误。和&#34;从未结婚的因素&#34; +没有这些因素它无法正常工作
nd<-data.frame(age=log(37),married = "married",union = "union",race = "white",grade = 14,never_married = "1"
,collgrad = "college grad",industry = "Manufacturing",
occupation = "Operatives",hours = 48,ttl_exp = 10, tenure = 5,c_city = "0")
predict(model2,nd)
因素看似喜欢
> head(h1)
idcode age race married never_married grade collgrad south smsa c_city industry occupation
1 1 3.610918 black single 0 12 not college grad 0 SMSA 0 Transport/Comm/Utility Operatives
2 2 3.610918 black single 0 12 not college grad 0 SMSA 1 Manufacturing Craftsmen
3 3 3.737670 black single 1 12 not college grad 0 SMSA 1 Manufacturing Sales
4 4 3.761200 white married 0 17 college grad 0 SMSA 0 Professional Services Other
5 6 3.737670 white married 0 12 not college grad 0 SMSA 0 Manufacturing Operatives
6 7 3.663562 white married 0 12 not college grad 0 SMSA 0 Professional Services Sales
union wage hours ttl_exp tenure
1 union 2.462927 48 10.333334 5.333333
2 union 1.856448 40 13.621795 5.250000
3 <NA> 1.612777 40 17.730770 1.250000
4 union 2.200974 42 13.211537 1.750000
5 nonunion 2.089853 48 17.820513 17.750000
6 nonunion 1.532477 30 7.326923 2.250000
出了什么问题?
答案 0 :(得分:1)
never_married
数据框中的变量类c_city
和h1
为integer
:
class(h1$never_married)
[1] "integer"
class(h1$c_city)
[1] "integer"
但在nd
数据框中,这些变量的类是factor
:
class(nd$never_married)
[1] "factor"
class(nd$c_city)
[1] "factor"
因此,nd
的代码应为:
nd <- data.frame(age=log(37), married="married", union="union",
race="white", grade=14, never_married=1, collgrad="college grad",
industry="Manufacturing", occupation="Operatives", hours=48,
ttl_exp=10, tenure=5, c_city=0)
在这些更改后,命令predict
会产生以下结果:
predict(model2,nd)
1
1.902962