在XGBoost中输入数字矩阵作为输入数据和数字因子作为标签,但仍然得到无效的输入数据错误?

时间:2018-06-08 13:41:31

标签: r data-science modeling xgboost gradient-descent

XGBoost需要数字矩阵作为输入数据,数字矢量作为其标签。但是,我仍然收到"无效的输入数据"和"标签将被忽略"作为我的错误消息。下面附有代码。我有没有办法输入数字矩阵作为输入数据和/或数字向量作为标签?

# Re-factor target column
#Attempting to put numeric vector as label -- will this work tho.... 

train$NAME_EDUCATION_TYPE  <- as.numeric(factor(train$NAME_EDUCATION_TYPE , labels = c(1:5)))
test$NAME_EDUCATION_TYPE  <- as.numeric(factor(test$NAME_EDUCATION_TYPE , labels = c(1:5)))
# Replace NAs with median
train$AMT_ANNUITY[is.na(train$AMT_ANNUITY)] <- with(train, ave(AMT_ANNUITY, FUN = function(x) median(x, na.rm = TRUE)))[is.na(train$AMT_ANNUITY)]
train$EXT_SOURCE_1[is.na(train$EXT_SOURCE_1)] <- with(train, ave(EXT_SOURCE_1, FUN = function(x) median(x, na.rm = TRUE)))[is.na(train$EXT_SOURCE_1)]
train$EXT_SOURCE_2[is.na(train$EXT_SOURCE_2)] <- with(train, ave(EXT_SOURCE_2, FUN = function(x) median(x, na.rm = TRUE)))[is.na(train$EXT_SOURCE_2)]
train$EXT_SOURCE_3[is.na(train$EXT_SOURCE_3)] <- with(train, ave(EXT_SOURCE_3, FUN = function(x) median(x, na.rm = TRUE)))[is.na(train$EXT_SOURCE_3)]

# Find percentages of NAs
lapply(1:dim(train)[2], function(i) {
data.frame(
   colnames(train)[i],
   sum(is.na(train[,i]))) / dim(train)[1]
}
) %>% bind_rows()

#---Checks data types of train
str(train_raw)

#-----XGBOOST Model
#Current error messages: invalid input data, label will be ignored
xgb_model <- xgboost(data = suppressWarnings(as.numeric(as.matrix(train))),
           label = train_raw$TARGET,
           nrounds = 5,
           objective = "binary:logistic",
           params = list(
             booster = "gblinear", 
             eta = 0.05,
             lambda = 1,
             lambda_bias = 1,
             gamma = 1,
             early_stopping_rounds = 3,
             eval_metric = "rmse")
           )

任何帮助解决&#34;无效的输入数据&#34;和&#34;标签将被忽略&#34;错误将非常感激。

0 个答案:

没有答案