我想训练几个模型(相同的算法/相同的预测变量),但是要对不同的数据子集进行训练。然后,我想结合这些模型来创建一个模型,然后将其用于预测新的看不见的数据。有关如何进行操作的任何指示或想法?
This似乎是一个类似的问题,但在这种情况下,OP希望对子集X进行训练,并在测试数据集中对相同的子集X进行预测。
#Split mtcars into train and test
index <- sample(1:nrow(mtcars),0.2*nrow(mtcars))
train <- mtcars[-index,]
test <- mtcars[index,]
#create n models on different subsets of train (same predictors)
models <- list()
for(i in 1:10){
models[[i]] <- lm(mpg ~ hp, data = dplyr::sample_n(train,10))
}
summary(models[[1]])
#R-squared of different models
sapply(models, function(x) summary(x)$r.squared)
#mean R-squared of all the models
mean(sapply(models, function(x) summary(x)$r.squared))
#I am looking for something like this :
#pred <- predict(combined_model, newdata = test[,-1])