在Introduction to Statistical Learning中,我们要求手动执行Leave Out One Cross Validation over logistic回归。它的代码在这里:
count = rep(0, dim(Weekly)[1])
for (i in 1:(dim(Weekly)[1])) {
##fitting a logistic regression model, not including ith data in the training data
glm.fit = glm(Direction ~ Lag1 + Lag2, data = Weekly[-i, ], family = binomial)
is_up = predict.glm(glm.fit, Weekly[i, ], type = "response") > 0.5
is_true_up = Weekly[i, ]$Direction == "Up"
if (is_up != is_true_up)
count[i] = 1
}
sum(count)
##[1] 490
可以找到此代码的来源here。
这意味着错误率约为45%。
但是当我们这样做时,使用cv.glm()
库的boot
函数,结果会大不相同。
> library(boot)
> glm.fit = glm(Direction~Lag1+Lag2,data=Weekly,family=binomial)
> cv.glm = cv.glm(Weekly,glm.fit)
> cv.glm$delta
[1] 0.2464536 0.2464530
为什么会这样? cv.glm()
函数究竟做了什么?