我遇到了xgboost和for循环的错误,我得到的错误如下;
<script type="text/javascript">
$(document).ready(function() {
var results = $('#results');
$('select[name="date"]').on('change', function() {
var $date = $('option:selected').text();
$.ajax({
url: 'admin/cars/history/HNH419/'+$date,
type: "GET",
dataType: "json",
success:function(response){
alert(response);
}})
})
});
</script>
有人问过类似的问题,here
该软件包的创建者建议如下;
这意味着您的一些培训数据或评估数据包含所有数据 1或全0作为标签
我的问题是二进制分类问题,Error in xgb.iter.eval(bst$handle, watchlist, iteration - 1, feval) :
[23:48:27] amalgamation/../src/metric/rank_metric.cc:135: Check failed: !auc_error AUC: the dataset only contains pos or neg samples
。
我的代码如下;
0, 1,
我遇到上面的错误......但是当我从all <- NULL
for(i in 1:length(splitxgb)){
xgbdata <- splitxgb[[i]]
smp_size <- floor(0.75 * nrow(xgbdata))
train_ind <- sample(seq_len(nrow(xgbdata)), size = smp_size)
train <- xgbdata[train_ind, ]
test <- xgbdata[-train_ind, ]
ids <- sample(nrow(train))
nfolds <- 5 #TAKE this out of the forloop
score <- data.table()
result <- data.table()
x_train <- train %>%
select(-BvD.ID.number, -Major.sectors, -Region.in.country, -Major.sectors.id, -Region.in.country.id, -status)
x_test <- test %>%
select(-BvD.ID.number, -Major.sectors, -Region.in.country, -Major.sectors.id, -Region.in.country.id, -status)
y_train <- train$status
y_test <- test$status
nrounds <- 12 #take out of the for loop
early_stopping_round <- NULL # take out of the for loop
dtrain <- xgb.DMatrix(data = as.matrix(x_train), label = y_train, missing=NaN)
dtest <- xgb.DMatrix(data = as.matrix(x_test), missing=NaN)
watchlist <- list(train = dtrain)
params <- list("eta" = 0.01,
"max_deptch" = 10, # take out of the for loop
"colsample_bytree" = 0.50,
"min_child_weight" = 0.75,
"subsample" = 0.5,
"objective" = "reg:logistic", #should this be reg_log, binary:log etc.
"eval_metric" = "auc")
model_xgb <- xgb.train(params = params,
data = dtrain,
maximize = TRUE,
nrounds = nrounds,
watchlist = watchlist,
early_stopping_rounds = early_stopping_round,
print_every_n = 1)
pred <- predict(model_xgb, dtest)
result <- cbind(test %>%
select(BvD.ID.number), status = round(pred, 0), pred)
compare <- merge(x = result, y = test[ , c("BvD.ID.number", "status", "Region.in.country", "Major.sectors")], by = "BvD.ID.number", all.x=TRUE)
all[[i]] <- compare
}
中取出所有内容并单独运行时,例如以下内容;
for loop
我分别为每个i <-165
xgbdata <- splitxgb[[i]]
smp_size <- floor(0.75 * nrow(xgbdata))
train_ind <- sample(seq_len(nrow(xgbdata)), size = smp_size)
train <- xgbdata[train_ind, ]
test <- xgbdata[-train_ind, ]
ids <- sample(nrow(train))
nfolds <- 5 #TAKE this out of the forloop
score <- data.table()
result <- data.table()
x_train <- train %>%
select(-BvD.ID.number, -Major.sectors, -Region.in.country, -Major.sectors.id, -Region.in.country.id, -status)
x_test <- test %>%
select(-BvD.ID.number, -Major.sectors, -Region.in.country, -Major.sectors.id, -Region.in.country.id, -status)
y_train <- train$status
y_test <- test$status
nrounds <- 12 #take out of the for loop
early_stopping_round <- NULL # take out of the for loop
dtrain <- xgb.DMatrix(data = as.matrix(x_train), label = y_train, missing=NaN)
dtest <- xgb.DMatrix(data = as.matrix(x_test), missing=NaN)
watchlist <- list(train = dtrain)
params <- list("eta" = 0.01,
"max_deptch" = 10, # take out of the for loop
"colsample_bytree" = 0.50,
"min_child_weight" = 0.75,
"subsample" = 0.5,
"objective" = "reg:logistic", #should this be reg_log, binary:log etc.
"eval_metric" = "auc")
model_xgb <- xgb.train(params = params,
data = dtrain,
maximize = TRUE,
nrounds = nrounds,
watchlist = watchlist,
early_stopping_rounds = early_stopping_round,
print_every_n = 1)
pred <- predict(model_xgb, dtest)
result <- cbind(test %>%
select(BvD.ID.number), status = round(pred, 0), pred)
compare <- merge(x = result, y = test[ , c("BvD.ID.number", "status", "Region.in.country", "Major.sectors")], by = "BvD.ID.number", all.x=TRUE)
all[[i]] <- compare
运行此操作...我没有收到任何错误,
网上有一些信息,但我遇到的问题没有什么特别的,为什么我在循环中获得错误但不是单独的?
答案 0 :(得分:1)
看起来您的分割有时会分割数据,无论是训练还是测试,因此所有标签都是1或0。
尝试打印(或写入CSV)所有部门,看看是否正确。
如果是这样,您希望确保每个分区(火车和测试)的每个标签至少有一行数据。
您可以通过重复拆分直到存在这种情况,或者在代码中以您选择的任何其他方式强制执行此操作。
我建议重新采样,直到存在这种情况。