我想使用下面的代码在插入符号中应用加权观察:
model_weights <- ifelse(train$y == 0,
(1/table(train$y)[1]) * 0.5,
(1/table(train$y)[2]) * 0.5)
xgbT <- train(x = as.matrix(train[,-21]), y = make.names(as.factor(train$y)),
method = "xgbTree",
trControl = cctrl1,
metric = "MCC",
maximize = TRUE,
weights = model_weights,
preProc = c("center", "scale"),
tuneGrid = expand.grid(nrounds = c(150), #number of trees
max_depth = c(7), #max tree depth
eta = c(0.03), #learning rate
gamma = c(0.3), #min split loss
colsample_bytree = c(0.7),
min_child_weight = c(10, 1, 5), #min number of instances in the leaf
subsample = c(0.6)), #subsample ratio of the training instance
early_stop_round = c(3), #if no improvements over specified rounds
objective = c("binary:logistic"),
silent = 0)
但是,它给了我这个错误:Error in model.frame.default(formula = .outcome ~ ., data = dat, weights = wts) :
variable lengths differ (found for '(weights)')
虽然我已检查过它们的长度与下面的代码相同:
> table(model_weights)
model_weights
0.0000277654375832963 0.000231481481481481
18008 2160
> table(train$y)
0 1
18008 2160
知道如何解决这个问题吗?
注意:我可以在没有train
参数的情况下运行weights
函数。
答案 0 :(得分:0)
经过进一步调试后,我发现问题是因为我在sampling
中应用了cctrl1
。因此,weights
的长度不同,因为我在应用重新采样之前会生成它。
因此,您只需从sampling
删除trControl
即可解决此问题。如果您仍想要重新采样,则必须在运行以下代码之前重新采样数据:
model_weights <- ifelse(train$y == 0,
(1/table(train$y)[1]) * 0.5,
(1/table(train$y)[2]) * 0.5)