Question

我在lapply中编写了一个函数，以便为数据框内的响应变量向量中的每个元素拟合GAM（带样条）。我选择使用caret来匹配模型，而不是直接使用mgcv或gam包，因为我希望最终将数据拆分为火车/测试集以进行验证并使用各种重采样技术。现在，我只是将trainControl方法设置为'none'，如下所示：

  # Set resampling method
  # tc <- trainControl(method = "boot", number = 100)
  # tc <- trainControl(method = "repeatedcv", number = 10, repeats = 1)
  tc <- trainControl(method = "none")

  fm <- lapply(group, function(x) {
  printFormula <- paste(x, "~", inf.factors)
  inputFormula <- as.formula(printFormula)
  # Partition input data for model training and testing
  # dpart <- createDataPartition(mdata[,x], times = 1, p = 0.7, list = FALSE)
  # train <- mdata[ data.partition, ]
  # test <- mdata[ -data.partition, ]

  cat("Fitting:", printFormula, "\n")
  # gam(inputFormula, family = binomial(link = "logit"), data = mdata)
  train(inputFormula, family = binomial(link = "logit"), data = mdata, method = "gam",
        trControl = tc)
})

执行此代码时，收到以下错误：

Error in train.default(x, y, weights = w, ...) : 
  Only one model should be specified in tuneGrid with no resampling

如果我在调试模式下重新运行代码，我可以找到caret停止培训过程的位置：

if (trControl$method == "none" && nrow(tuneGrid) != 1) 
    stop("Only one model should be specified in tuneGrid with no resampling")

显然train函数由于第二个条件而失败，但当我查找tuning parameters for a GAM（带样条线）时，只有一个选项可供选择（不感兴趣，我想保留所有模型中的预测因子）和方法。因此，当我致电tuneGrid时，我不会包含train数据框。这就是模型以这种方式失败的原因吗？我将提供什么参数以及tuneGrid的外观是什么？

我应该补充一点，当我使用bootstrapping或k-fold CV时，模型已成功训练，但是这些重采样方法需要更长的时间来计算，我不需要使用它们。

对此问题的任何帮助将不胜感激！

Answer 1

对于该模型，调整网格会查看select个参数的两个值：

> getModelInfo("gam", regex = FALSE)[[1]]$grid
function(x, y, len = NULL, search = "grid") {
   if(search == "grid") {
      out <- expand.grid(select = c(TRUE, FALSE), method = "GCV.Cp")
   } else {
      out <- data.frame(select = sample(c(TRUE, FALSE), size = len, replace = TRUE),
                         method = sample(c("GCV.Cp", "ML"), size = len, replace = TRUE))
   }
    out[!duplicated(out),]
 }

您应该使用类似tuneGrid = data.frame(select = FALSE, method = "GCV.Cp")的内容来评估单个模型（如错误消息所示）。

在插入符号中不重新采样的GAM方法会产生停止错误

1 个答案: