错误"变量长度不同"在插入符号(R)中分配权重参数时

时间:2017-11-12 08:53:36

标签: r r-caret

我想使用下面的代码在插入符号中应用加权观察:

model_weights <- ifelse(train$y == 0,
                        (1/table(train$y)[1]) * 0.5,
                        (1/table(train$y)[2]) * 0.5)

xgbT <- train(x = as.matrix(train[,-21]), y = make.names(as.factor(train$y)), 
              method = "xgbTree", 
              trControl = cctrl1,
              metric = "MCC",
              maximize = TRUE,
              weights = model_weights,
              preProc = c("center", "scale"),
              tuneGrid = expand.grid(nrounds = c(150), #number of trees
                                    max_depth = c(7), #max tree depth
                                    eta = c(0.03), #learning rate
                                    gamma = c(0.3), #min split loss
                                    colsample_bytree = c(0.7),
                                    min_child_weight = c(10, 1, 5), #min number of instances in the leaf
                                    subsample = c(0.6)), #subsample ratio of the training instance
              early_stop_round = c(3), #if no improvements over specified rounds
              objective = c("binary:logistic"),
              silent = 0)

但是,它给了我这个错误:Error in model.frame.default(formula = .outcome ~ ., data = dat, weights = wts) : variable lengths differ (found for '(weights)')

虽然我已检查过它们的长度与下面的代码相同:

> table(model_weights)
model_weights
0.0000277654375832963  0.000231481481481481 
                18008                  2160 
> table(train$y)

    0     1 
18008  2160 

知道如何解决这个问题吗?

注意:我可以在没有train参数的情况下运行weights函数。

1 个答案:

答案 0 :(得分:0)

经过进一步调试后,我发现问题是因为我在sampling中应用了cctrl1。因此,weights的长度不同,因为我在应用重新采样之前会生成它。

因此,您只需从sampling删除trControl即可解决此问题。如果您仍想要重新采样,则必须在运行以下代码之前重新采样数据:

model_weights <- ifelse(train$y == 0,
                    (1/table(train$y)[1]) * 0.5,
                    (1/table(train$y)[2]) * 0.5)