set.seed:for循环中产生的重采样

时间:2019-02-21 15:45:31

标签: r for-loop cross-validation r-caret resampling

我想产生相等的重采样并在for循环中使用train()包的caret函数。选择的建模方法是随机森林(rf)。

这是我的代码:

# for reproducible results in runif set.seed(43)
set.seed(43)
dat <- data.frame(x = runif(300, min=0, max=100),
                  y = runif(300, min=0, max=10),
                  z = runif(300, min=0, max=5),
                  p = runif(300),
                  a = floor(runif(300, min=0, max=101)),
                  b = sample( LETTERS[1:4], 300, replace=TRUE, prob=c(0.2, 0.2, 0.4, 0.2)),
                  c = sample( LETTERS[5:8], 300, replace=TRUE, prob=c(0.2, 0.2, 0.4, 0.2)))

targets <- c("x","y")
predictors_name <- c("z", "p", "a", "b", "c")
model_list <- list()

library(caret)

for (i in 1:length(targets)) {

  # set predictors and response
  pred <- dat[,4:7]
  response <- dat[, which(names(dat) == targets[i])]

  # set parameter mtry as a dataframe (so the tuneGrid parameter of train will take it)
  params <- data.frame(mtry = 4)

  # specifiy trainControl
  control <- trainControl(method="repeatedcv", number=10, repeats=10, savePred =T)

  # fit models with fixed hyperparameter 
  set.seed(43)
  model <- train(x = pred,
                 y = response,  
                 method = "rf",
                 ntree = 25,
                 metric = "RMSE",
                 tuneGrid = params,
                 trControl = control,
                 importance = TRUE)

  model_list[[i]] <- model

}

现在,我想通过rowIndexResample合并两个数据帧,而不会生成NA,但是列Resample的值不匹配:

d1 <- model_list[[1]]$pred
d2 <- model_list[[2]]$pred

d1[d1$rowIndex == 1,]
d2[d2$rowIndex == 1,]

Resampled1的列d2应该具有完全相同的值。

即使我将set.seed()放在train()之前,循环产生的重采样为何也不同? 如何产生相等的重采样?

0 个答案:

没有答案