输出1：

Question

请考虑以下代码：

library(ISLR)

row_list <- structure(list(`1` = 1:40, `2` = 41:79, `3` = 80:118, `4` = 119:157, 
               `5` = 158:196, `6` = 197:235, `7` = 236:274, `8` = 275:313, 
               `9` = 314:352, `10` = 353:392), 
          .Names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10"))
test <- row_list[[1]]
train <- setdiff(unlist(row_list), row_list[[1]])

输出1：

> glm(mpg ~ poly(horsepower, 1), data = Auto, subset = train)

Call:  glm(formula = mpg ~ poly(horsepower, 1), data = Auto, subset = train)

Coefficients:
        (Intercept)  poly(horsepower, 1)  
              23.37              -133.05  

Degrees of Freedom: 351 Total (i.e. Null);  350 Residual
Null Deviance:      21460 
Residual Deviance: 8421     AIC: 2122

输出2：

> glm(mpg ~ poly(horsepower, 1), data = Auto[train,])

Call:  glm(formula = mpg ~ poly(horsepower, 1), data = Auto[train, ])

Coefficients:
        (Intercept)  poly(horsepower, 1)  
              24.05              -114.19  

Degrees of Freedom: 351 Total (i.e. Null);  350 Residual
Null Deviance:      21460 
Residual Deviance: 8421     AIC: 2122

如上所示，(Intercept)和poly(horsepower, 1)值在两个输出之间不同。这是为什么？

至少对于lm()，统计学习简介建议（参见第191页）行索引可以在subset参数中使用。 glm()不是这种情况，或subset是不是正确使用了吗？

Answer 1

这与poly如何构造正交多项式有关。

在第一个示例中，它们是在子集化之前构建的，而在第二个示例中，首先进行子集化（当您将子集化数据传递给glm时）。

使用原始多项式可得到相同的结果：

coef(glm(mpg~poly(hp,1),data=mtcars,subset=10:32))
(Intercept) poly(hp, 1) 
   20.63307   -28.66876 
coef(glm(mpg~poly(hp,1),data=mtcars[10:32,]))
(Intercept) poly(hp, 1) 
   19.93043   -25.43935 
coef(glm(mpg~poly(hp,1,raw=TRUE),data=mtcars,subset=10:32))
            (Intercept) poly(hp, 1, raw = TRUE) 
            31.64927851             -0.07509986 
coef(glm(mpg~poly(hp,1,raw=TRUE),data=mtcars[10:32,]))
            (Intercept) poly(hp, 1, raw = TRUE) 
            31.64927851             -0.07509986

在stats :: glm（）中，为什么子集参数会给我自己的数据参数子集提供不同的结果？

输出1：

输出2：

1 个答案: