我想产生相等的重采样并在for循环中使用train()
包的caret
函数。选择的建模方法是随机森林(rf
)。
这是我的代码:
# for reproducible results in runif set.seed(43)
set.seed(43)
dat <- data.frame(x = runif(300, min=0, max=100),
y = runif(300, min=0, max=10),
z = runif(300, min=0, max=5),
p = runif(300),
a = floor(runif(300, min=0, max=101)),
b = sample( LETTERS[1:4], 300, replace=TRUE, prob=c(0.2, 0.2, 0.4, 0.2)),
c = sample( LETTERS[5:8], 300, replace=TRUE, prob=c(0.2, 0.2, 0.4, 0.2)))
targets <- c("x","y")
predictors_name <- c("z", "p", "a", "b", "c")
model_list <- list()
library(caret)
for (i in 1:length(targets)) {
# set predictors and response
pred <- dat[,4:7]
response <- dat[, which(names(dat) == targets[i])]
# set parameter mtry as a dataframe (so the tuneGrid parameter of train will take it)
params <- data.frame(mtry = 4)
# specifiy trainControl
control <- trainControl(method="repeatedcv", number=10, repeats=10, savePred =T)
# fit models with fixed hyperparameter
set.seed(43)
model <- train(x = pred,
y = response,
method = "rf",
ntree = 25,
metric = "RMSE",
tuneGrid = params,
trControl = control,
importance = TRUE)
model_list[[i]] <- model
}
现在,我想通过rowIndex
和Resample
合并两个数据帧,而不会生成NA,但是列Resample
的值不匹配:
d1 <- model_list[[1]]$pred
d2 <- model_list[[2]]$pred
d1[d1$rowIndex == 1,]
d2[d2$rowIndex == 1,]
Resample
和d1
的列d2
应该具有完全相同的值。
即使我将set.seed()
放在train()
之前,循环产生的重采样为何也不同?
如何产生相等的重采样?