在car
包中,我尝试根据prestige
,Prestige
和income
来预测名为education
的数据集中名为type
的响应变量。 lm
函数的因子education
。但在我填充数据之前,我想缩放income
和Error: variables ‘income’, ‘I(income^2)’, ‘education’, ‘I(education^2)’ were specified with different types from the fit
。如果您在R stuido中复制并运行下面的代码,控制台会说library(car)
summary(Prestige)
Prestige$education <- scale(Prestige$education)
Prestige$income <- scale(Prestige$income)
fit <- lm(prestige ~ income + I(income^2) + education + I(education^2)
+ income:education + type + type:income + type:I(income^2)
+ type:education + type:I(education^2)+ type:income:education, Prestige)
summary(fit)
pred <- expand.grid(income = c(1000, 20000), education = c(10,20),type = levels(Prestige $ type))
pred $ prestige.pred <- predict(fit, newdata = pred)
pred
<ion-img>
如果不缩放预测变量,它就能成功运作。所以错误肯定是由于预测之前的缩放,我想知道如何解决这个问题?
答案 0 :(得分:5)
请注意,scale()
实际上会更改列的类。参见
class(car::Prestige$education)
# [1] "numeric"
class(scale(car::Prestige$education))
# [1] "matrix"
你可以安全地将它们简化为数字向量。您可以将c()
的尺寸剥离属性用于此
Prestige$education <- c(scale(Prestige$education))
Prestige$income <- c(scale(Prestige$income))
然后我可以用
运行你的模型fit <- lm(prestige ~ income + I(income^2) + education + I(education^2)
+ income:education + type + type:income + type:I(income^2)
+ type:education + type:I(education^2)+ type:income:education,
Prestige, na.action="na.omit")
并且预测返回
income education type prestige.pred
1 1000 10 bc -1352364.5
2 20000 10 bc -533597423.4
3 1000 20 bc -1382361.7
4 20000 20 bc -534229639.3
5 1000 10 prof 398464.2
6 20000 10 prof 155567014.1
7 1000 20 prof 409271.3
8 20000 20 prof 155765754.7
9 1000 10 wc -7661464.3
10 20000 10 wc -3074382169.9
11 1000 20 wc -7634693.8
12 20000 20 wc -3073902696.6
另请注意,您可以使用
稍微简化配方fit<-lm(prestige ~ (income + I(income^2) + education + I(education^2))*type +
income:education + type:income:education, Prestige, na.action="na.omit")
这使用*
创建了许多互动术语。
答案 1 :(得分:2)
scale()
添加了似乎会导致lm()
出现问题的属性。使用
Prestige$education <- as.numeric(scale(Prestige$education))
Prestige$education <- as.numeric(scale(Prestige$income))
让一切正常。