Question

在Introduction to Statistical Learning中，我们要求手动执行Leave Out One Cross Validation over logistic回归。它的代码在这里：

count = rep(0, dim(Weekly)[1])
for (i in 1:(dim(Weekly)[1])) {
##fitting a logistic regression model, not including ith data in the training data
    glm.fit = glm(Direction ~ Lag1 + Lag2, data = Weekly[-i, ], family = binomial)

    is_up = predict.glm(glm.fit, Weekly[i, ], type = "response") > 0.5

    is_true_up = Weekly[i, ]$Direction == "Up"
    if (is_up != is_true_up) 
        count[i] = 1
}
sum(count)
##[1] 490

可以找到此代码的来源here。

这意味着错误率约为45％。但是当我们这样做时，使用cv.glm()库的boot函数，结果会大不相同。

> library(boot)
> glm.fit = glm(Direction~Lag1+Lag2,data=Weekly,family=binomial)
> cv.glm = cv.glm(Weekly,glm.fit)
> cv.glm$delta
[1] 0.2464536 0.2464530

为什么会这样？ cv.glm()函数究竟做了什么？

手动LOOCV与cv.glm

0 个答案: