我试图预测一家航空公司是否会通过查看他们之前的新增内容并根据前一年的情况对模型进行培训来添加到现有网络的路线。我之前使用过xgboost并且工作正常,但我删除了一些城市,现在xgboost只是预测一切都是50:50。
trainm <- sparse.model.matrix(add ~. -1, data = train)
train_label <- train[, "add"]
train_matrix <- xgb.DMatrix(data = (trainm), label = train_label)
testm <- sparse.model.matrix(add~. -1, data = test)
test_label <- test[, "add"]
test_matrix <- xgb.DMatrix(data = (testm), label = test_label)
nc <- length(unique(train_label))
xgb_params <- list("objective" = "binary:logistic",
"eval_metric" = "error",
"scale_pos_weight" = weight)
watchlist <- list(train = train_matrix, test = test_matrix)
bst_model <- xgb.train(params = xgb_params,
nthreads = 2,
data = train_matrix,
nrounds = 10,
watchlist = watchlist,
booster = 'gbtree'
)
输出:
[1] train-error:0.972469 test-error:0.972580
[2] train-error:0.972469 test-error:0.972580
[3] train-error:0.972469 test-error:0.972580
[4] train-error:0.972469 test-error:0.972580
[5] train-error:0.972469 test-error:0.972580
[6] train-error:0.972469 test-error:0.972580
[7] train-error:0.972469 test-error:0.972580
[8] train-error:0.972469 test-error:0.972580
[9] train-error:0.972469 test-error:0.972580
[10] train-error:0.972469 test-error:0.972580
它是加权的,因为它是非常不平衡的(每1个阳性约36个阴性)只是不知道它为什么突然不起作用。
编辑。它固定了自己,我不明白为什么。
EDIT2。它再次做到了,我不知道为什么。
EDIT3。我修好了它。它与某些列中的NA值有关。