我正在尝试使用here描述的示例组合来自不同模型的信号。我有不同的数据集,可以预测相同的输出。但是,当我在caretList
中组合模型输出并对信号进行整合时,会出现错误
Error in check_bestpreds_resamples(modelLibrary) :
Component models do not have the same re-sampling strategies
这是可重现的例子
library(caret)
library(caretEnsemble)
df1 <-
data.frame(x1 = rnorm(200),
x2 = rnorm(200),
y = as.factor(sample(c("Jack", "Jill"), 200, replace = T)))
df2 <-
data.frame(z1 = rnorm(400),
z2 = rnorm(400),
y = as.factor(sample(c("Jack", "Jill"), 400, replace = T)))
library(caret)
check_1 <- train( x = df1[,1:2],y = df1[,3],
method = "nnet",
tuneLength = 10,
trControl = trainControl(method = "cv",
classProbs = TRUE,
savePredictions = T))
check_2 <- train( x = df2[,1:2],y = df2[,3] ,
method = "nnet",
preProcess = c("center", "scale"),
tuneLength = 10,
trControl = trainControl(method = "cv",
classProbs = TRUE,
savePredictions = T))
combine <- c(check_1, check_2)
ens <- caretEnsemble(combine)
答案 0 :(得分:1)
首先,您正在尝试组合在不同训练数据集上训练的2个模型。那不行。所有集合模型都需要基于相同的训练集。您将在每个训练模型中拥有不同的重新采样集。因此,您当前的错误。
在不使用caretList的情况下构建模型也很危险,因为获得不同的重采样策略会有很大的变化。您可以通过使用trainControl中的索引来更好地控制(请参阅vignette)。
如果您使用1个数据集,则可以使用以下代码:
ctrl <- trainControl(method = "cv",
number = 5,
classProbs = TRUE,
savePredictions = "final")
set.seed(1324)
# will generate the following warning:
# indexes not defined in trControl. Attempting to set them ourselves, so
# each model in the ensemble will have the same resampling indexes.
models <- caretList(x = df1[,1:2],
y = df1[,3] ,
trControl = ctrl,
tuneList = list(
check_1 = caretModelSpec(method = "nnet", tuneLength = 10),
check_2 = caretModelSpec(method = "nnet", tuneLength = 10, preProcess = c("center", "scale"))
))
ens <- caretEnsemble(models)
A glm ensemble of 2 base models: nnet, nnet
Ensemble results:
Generalized Linear Model
200 samples
2 predictor
2 classes: 'Jack', 'Jill'
No pre-processing
Resampling: Bootstrapped (25 reps)
Summary of sample sizes: 200, 200, 200, 200, 200, 200, ...
Resampling results:
Accuracy Kappa
0.5249231 0.04164767
另请阅读this guide有关不同合奏策略的内容。