Question

我想根据训练错误选择变量。出于这个原因，我将trainControl中的方法设置为“none”。但是，如果我在下面运行两次函数，我会得到两个不同的错误（正确率）。在这个例子中，差异不值得一提。即便如此，我也不会期待任何差异。

有人知道这种差异来自哪里吗？

library(caret)

c_1 <- trainControl(method = "none")

maxvar     <-(4) 
direction <-"forward"
tune_1     <-data.frame(maxvar,direction)

train(Species~., data=iris, method = "stepLDA", trControl=c_1, tuneGrid=tune_1)->tr

第一

`stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
150 observations of 4 variables in 3 classes; direction: forward
stop criterion: assemble 4 best variables.
correctness rate: 0.96;  in: "Petal.Width";  variables (1): Petal.Width 
correctness rate: 0.96667;  in: "Sepal.Width";  variables (2): Petal.Width, Sepal.Width 
correctness rate: 0.97333;  in: "Petal.Length";  variables (3): Petal.Width, Sepal.Width, Petal.Length 
correctness rate: 0.98;  in: "Sepal.Length";  variables (4): Petal.Width, Sepal.Width, Petal.Length, Sepal.Length 

 hr.elapsed min.elapsed sec.elapsed 
       0.00        0.00        0.28

第二

> train(Species~., data=iris, method = "stepLDA", trControl=c_1, tuneGrid=tune_1)->tr
 `stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
150 observations of 4 variables in 3 classes; direction: forward
stop criterion: assemble 4 best variables.
correctness rate: 0.96;  in: "Petal.Width";  variables (1): Petal.Width 
correctness rate: 0.96;  in: "Sepal.Width";  variables (2): Petal.Width, Sepal.Width 
correctness rate: 0.96667;  in: "Petal.Length";  variables (3): Petal.Width, Sepal.Width, Petal.Length 
correctness rate: 0.98;  in: "Sepal.Length";  variables (4): Petal.Width, Sepal.Width, Petal.Length, Sepal.Length 

 hr.elapsed min.elapsed sec.elapsed 
        0.0         0.0         0.3

Answer 1

您仍在进行10倍交叉验证。只要您没有设置种子，当您多次训练模型时，您将始终得到稍微不同的答案。

如果您运行这段代码，包括set.seed，您将获得相同的正确率。

set.seed(42)
tr <- train(Species~., data=iris, method = "stepLDA", trControl=c_1, tuneGrid=tune_1)

根据评论进行编辑：

10倍交叉验证的正确率不是来自Caret，而是来自klaR包的stepclass函数。

stepclass（x，分组，方法，改进= 0.05，maxvar = Inf，       start.vars = NULL，direction = c（“both”，“forward”，“backward”），       criterion =“CR”， fold = 10 ，cv.groups = NULL，output = TRUE，       min1var = TRUE，...）

用于交叉验证的折叠参数;如果'cv.groups'是，则省略   指定。

如果您只想将fold参数添加到列车功能，则可以调整此项：

tr <- train(Species~., data=iris, method = "stepLDA", trControl=c_1, tuneGrid=tune_1, fold = 1)

但是1的折叠毫无意义。你会得到一堆警告和错误。

没有交叉验证的StepLDA

1 个答案:

根据评论进行编辑：