逻辑回归的交叉验证

时间:2019-08-26 21:31:14

标签: r

我在运行R的逻辑回归的10倍交叉验证时遇到一些问题。

我使用了cv.glm()函数,但是显示错误。但是,我将此功能用于ISLR包中的Smarket数据,但未显示任何错误。我的逻辑回归中的预测变量是二进制的。

# 10-Fold Cross-Validation for Logistic Regression
cv.errorlog7 <- cv.glm(p, logit7, K=10)$delta[1] 

我收到以下错误消息:

Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) : 
  factor gender has new levels Other
In addition: Warning messages:
1: In predict.lm(object, newdata, se.fit, scale = 1, type = if (type ==  :
  prediction from a rank-deficient fit may be misleading
2: In predict.lm(object, newdata, se.fit, scale = 1, type = if (type ==  :
  prediction from a rank-deficient fit may be misleading
3: In y - yhat :
  longer object length is not a multiple of shorter object length

1 个答案:

答案 0 :(得分:1)

我遇到了非常相似的错误:

> set.seed(100)
> cv.lm(data = catering1, form.lm = model, m=3) # 3 fold cross-validation
Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) : 
  factor Month has new levels July
# Reset seed
> set.seed(1000)
> cv.lm(data = catering1, form.lm = model, m=3) # 3 fold cross-validation
Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) : 
  factor Month has new levels July

如您所见,我什至重置了种子并再次尝试。没有运气。但是,当我将折叠倍数增加(直到得到响应之前,我一直增加1倍)时,代码才起作用。但是我确实得到了一个错误和警告。

> cv.lm(data = catering, form.lm = model, m=5) # 5 fold cross-validation
Response.... Anova table....
Error in which.min(xval) : 
  'list' object cannot be coerced to type 'double'
In addition: Warning message:
In cv.lm(data = catering, form.lm = model, m = 5) : 

 As there is >1 explanatory variable, cross-validation
 predicted values for a fold are not a linear function
 of corresponding overall predicted values.  Lines that
 are shown for the different folds are approximate

所以,我会尝试增加折叠次数。特别是由于您的数据集相对较小,因此不会对性能产生太大影响。