sparse_matrix <- sparse.model.matrix(state ~ . -Month -LanID, data = smoted_data)
smoted_data_mat_label<-as.matrix(as.numeric(smoted_data$state))
smoted_data_mat_label = smoted_data[,"state"] == 1
xgb.fit <- xgboost(
data = sparse_matrix,
label = smoted_data_mat_label,
eta = 1.8,
max_depth = 7,
min_child_weight = 11, # could be anything all are showing same results
nrounds = 9, # could be anything all are showing same results
nfold = 10,
objective = "binary:logistic", # for regression models
verbose = 1, # silent,
early_stopping_rounds = 10 # stop if no improvement for 10 consecutive trees
)
Error in setinfo.xgb.DMatrix(dmat, names(p), p[[1]]) :
The length of labels must equal to the number of rows in the input data
length(smoted_data_mat_label)
[1] 22721
> length(sparse_matrix)
[1] 41739681
dim(smoted_data)
[1] 22721 31
length(smoted_data$LanID)
[1] 22721
之前的代码运行正常,现在导致以下错误,为什么不知道呢?我检查了sparse_matrix的类,它是矩阵,标号也是一个矩阵。
答案 0 :(得分:0)
之所以出现此问题,是因为在新数据中引入了一些NA。删除了NA,算法运行良好。