在R中集合不同的数据集

时间:2018-04-09 05:08:27

标签: r r-caret ensemble-learning

我正在尝试使用here描述的示例组合来自不同模型的信号。我有不同的数据集,可以预测相同的输出。但是,当我在caretList中组合模型输出并对信号进行整合时,会出现错误

Error in check_bestpreds_resamples(modelLibrary) : 
  Component models do not have the same re-sampling strategies

这是可重现的例子

library(caret)
library(caretEnsemble)
df1 <-
  data.frame(x1 = rnorm(200),
             x2 = rnorm(200),
             y = as.factor(sample(c("Jack", "Jill"), 200, replace = T)))

df2 <-
  data.frame(z1 = rnorm(400),
             z2 = rnorm(400),
             y = as.factor(sample(c("Jack", "Jill"), 400, replace = T)))

library(caret)
check_1 <- train( x = df1[,1:2],y = df1[,3],
                 method = "nnet",
                 tuneLength = 10,
                 trControl = trainControl(method = "cv",
                                          classProbs = TRUE,
                                          savePredictions = T))

check_2 <- train( x = df2[,1:2],y = df2[,3] ,
                 method = "nnet",
                 preProcess = c("center", "scale"),
                 tuneLength = 10,
                 trControl = trainControl(method = "cv",
                                          classProbs = TRUE,
                                          savePredictions = T))


combine <- c(check_1, check_2)
ens <- caretEnsemble(combine)

1 个答案:

答案 0 :(得分:1)

首先,您正在尝试组合在不同训练数据集上训练的2个模型。那不行。所有集合模型都需要基于相同的训练集。您将在每个训练模型中拥有不同的重新采样集。因此,您当前的错误。

在不使用caretList的情况下构建模型也很危险,因为获得不同的重采样策略会有很大的变化。您可以通过使用trainControl中的索引来更好地控制(请参阅vignette)。

如果您使用1个数据集,则可以使用以下代码:

ctrl <- trainControl(method = "cv",
                     number = 5,
                     classProbs = TRUE,
                     savePredictions = "final")

set.seed(1324)
# will generate the following warning:
# indexes not defined in trControl.  Attempting to set them ourselves, so 
# each model in the ensemble will have the same resampling indexes.
models <- caretList(x = df1[,1:2],
                    y = df1[,3] ,
                    trControl = ctrl,
                    tuneList = list(
                      check_1 = caretModelSpec(method = "nnet", tuneLength = 10),
                      check_2 = caretModelSpec(method = "nnet", tuneLength = 10, preProcess = c("center", "scale"))
                    )) 


ens <- caretEnsemble(models)


A glm ensemble of 2 base models: nnet, nnet

Ensemble results:
Generalized Linear Model 

200 samples
  2 predictor
  2 classes: 'Jack', 'Jill' 

No pre-processing
Resampling: Bootstrapped (25 reps) 
Summary of sample sizes: 200, 200, 200, 200, 200, 200, ... 
Resampling results:

  Accuracy   Kappa     
  0.5249231  0.04164767

另请阅读this guide有关不同合奏策略的内容。