我在[.default
(cm,2,2):下标超出范围 的 错误实现了xgboost的交叉验证。我的数据集结构如下:
'data.frame': 889 obs. of 7 variables:
$ Survived: Factor w/ 2 levels "0","1": 1 2 2 2 1 1 1 1 2 2 ...
$ Pclass : int 3 1 3 1 3 3 1 3 3 2 ...
$ Sex : num 1 2 2 2 1 1 1 1 2 2 ...
$ SibSp : int 1 1 0 1 0 0 0 3 0 1 ...
$ Parch : int 0 0 0 0 0 0 0 1 2 0 ...
$ Fare : num 7.25 71.28 7.92 53.1 8.05 ...
$ Embarked: num 3 1 3 3 3 2 3 3 3 1 ...
- attr(*, "na.action")=Class 'omit' Named int [1:2] 62 830
.. ..- attr(*, "names")= chr [1:2] "62" "830"
我的数据集摘要如下:
Survived Pclass Sex SibSp Parch
0:549 Min. :1.000 Min. :1.000 Min. :0.0000 Min. :0.0000
1:340 1st Qu.:2.000 1st Qu.:1.000 1st Qu.:0.0000 1st Qu.:0.0000
Median :3.000 Median :1.000 Median :0.0000 Median :0.0000
Mean :2.312 Mean :1.351 Mean :0.5242 Mean :0.3825
3rd Qu.:3.000 3rd Qu.:2.000 3rd Qu.:1.0000 3rd Qu.:0.0000
Max. :3.000 Max. :2.000 Max. :8.0000 Max. :6.0000
Fare Embarked
Min. : 0.000 Min. :1.000
1st Qu.: 7.896 1st Qu.:2.000
Median : 14.454 Median :3.000
Mean : 32.097 Mean :2.535
3rd Qu.: 31.000 3rd Qu.:3.000
Max. :512.329 Max. :3.000
实现以下代码时抛出错误:
library(caret)
folds = createFolds(traindataset$Survived, k = 10)
cv = lapply(folds, function(x) {
training_fold = traindataset[-x, ]
test_fold = traindataset[x, ]
classifier = xgboost(data = as.matrix(traindataset[-1]), label = traindataset$Survived, nrounds = 10)
y_pred = predict(classifier, newdata = as.matrix(test_fold[-1]))
y_pred = (y_pred >= 0.5)
cm = table(test_fold[, 1], y_pred)
accuracy = (cm[1,1] + cm[2,2]) / (cm[1,1] + cm[2,2] + cm[1,2] + cm[2,1])
return(accuracy)
})
请注意,我已将 Survived 从0和1的整数转换为用于分类目的的因子。令我惊讶的是,当Survived是一个整数时,代码可以工作,但是当它是一个因素时,我得到了这个错误。
感谢任何帮助。谢谢。
答案 0 :(得分:1)
我找到了问题的解决方案。对此给您带来的不便表示歉意。
在这里,我将目标变量转换为因子正在产生问题。我假设xgboost需要数字输入而不是因素,因此产生了问题。