在stats :: glm()中,为什么子集参数会给我自己的数据参数子集提供不同的结果?

时间:2017-10-25 17:55:03

标签: r

请考虑以下代码:

library(ISLR)

row_list <- structure(list(`1` = 1:40, `2` = 41:79, `3` = 80:118, `4` = 119:157, 
               `5` = 158:196, `6` = 197:235, `7` = 236:274, `8` = 275:313, 
               `9` = 314:352, `10` = 353:392), 
          .Names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10"))
test <- row_list[[1]]
train <- setdiff(unlist(row_list), row_list[[1]])

输出1:

> glm(mpg ~ poly(horsepower, 1), data = Auto, subset = train)

Call:  glm(formula = mpg ~ poly(horsepower, 1), data = Auto, subset = train)

Coefficients:
        (Intercept)  poly(horsepower, 1)  
              23.37              -133.05  

Degrees of Freedom: 351 Total (i.e. Null);  350 Residual
Null Deviance:      21460 
Residual Deviance: 8421     AIC: 2122

输出2:

> glm(mpg ~ poly(horsepower, 1), data = Auto[train,])

Call:  glm(formula = mpg ~ poly(horsepower, 1), data = Auto[train, ])

Coefficients:
        (Intercept)  poly(horsepower, 1)  
              24.05              -114.19  

Degrees of Freedom: 351 Total (i.e. Null);  350 Residual
Null Deviance:      21460 
Residual Deviance: 8421     AIC: 2122

如上所示,(Intercept)poly(horsepower, 1)值在两个输出之间不同。这是为什么?

至少对于lm()统计学习简介建议(参见第191页)行索引可以在subset参数中使用。 glm()不是这种情况,或subset是不是正确使用了吗?

1 个答案:

答案 0 :(得分:7)

这与poly如何构造正交多项式有关。

在第一个示例中,它们是在子集化之前构建的,而在第二个示例中,首先进行子集化(当您将子集化数据传递给glm时)。

使用原始多项式可得到相同的结果:

coef(glm(mpg~poly(hp,1),data=mtcars,subset=10:32))
(Intercept) poly(hp, 1) 
   20.63307   -28.66876 
coef(glm(mpg~poly(hp,1),data=mtcars[10:32,]))
(Intercept) poly(hp, 1) 
   19.93043   -25.43935 
coef(glm(mpg~poly(hp,1,raw=TRUE),data=mtcars,subset=10:32))
            (Intercept) poly(hp, 1, raw = TRUE) 
            31.64927851             -0.07509986 
coef(glm(mpg~poly(hp,1,raw=TRUE),data=mtcars[10:32,]))
            (Intercept) poly(hp, 1, raw = TRUE) 
            31.64927851             -0.07509986