请考虑以下代码:
library(ISLR)
row_list <- structure(list(`1` = 1:40, `2` = 41:79, `3` = 80:118, `4` = 119:157,
`5` = 158:196, `6` = 197:235, `7` = 236:274, `8` = 275:313,
`9` = 314:352, `10` = 353:392),
.Names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10"))
test <- row_list[[1]]
train <- setdiff(unlist(row_list), row_list[[1]])
> glm(mpg ~ poly(horsepower, 1), data = Auto, subset = train)
Call: glm(formula = mpg ~ poly(horsepower, 1), data = Auto, subset = train)
Coefficients:
(Intercept) poly(horsepower, 1)
23.37 -133.05
Degrees of Freedom: 351 Total (i.e. Null); 350 Residual
Null Deviance: 21460
Residual Deviance: 8421 AIC: 2122
> glm(mpg ~ poly(horsepower, 1), data = Auto[train,])
Call: glm(formula = mpg ~ poly(horsepower, 1), data = Auto[train, ])
Coefficients:
(Intercept) poly(horsepower, 1)
24.05 -114.19
Degrees of Freedom: 351 Total (i.e. Null); 350 Residual
Null Deviance: 21460
Residual Deviance: 8421 AIC: 2122
如上所示,(Intercept)
和poly(horsepower, 1)
值在两个输出之间不同。这是为什么?
至少对于lm()
,统计学习简介建议(参见第191页)行索引可以在subset
参数中使用。 glm()
不是这种情况,或subset
是不是正确使用了吗?
答案 0 :(得分:7)
这与poly
如何构造正交多项式有关。
在第一个示例中,它们是在子集化之前构建的,而在第二个示例中,首先进行子集化(当您将子集化数据传递给glm
时)。
使用原始多项式可得到相同的结果:
coef(glm(mpg~poly(hp,1),data=mtcars,subset=10:32))
(Intercept) poly(hp, 1)
20.63307 -28.66876
coef(glm(mpg~poly(hp,1),data=mtcars[10:32,]))
(Intercept) poly(hp, 1)
19.93043 -25.43935
coef(glm(mpg~poly(hp,1,raw=TRUE),data=mtcars,subset=10:32))
(Intercept) poly(hp, 1, raw = TRUE)
31.64927851 -0.07509986
coef(glm(mpg~poly(hp,1,raw=TRUE),data=mtcars[10:32,]))
(Intercept) poly(hp, 1, raw = TRUE)
31.64927851 -0.07509986