我想根据训练错误选择变量。 出于这个原因,我将trainControl中的方法设置为“none”。但是,如果我在下面运行两次函数,我会得到两个不同的错误(正确率)。 在这个例子中,差异不值得一提。即便如此,我也不会期待任何差异。
有人知道这种差异来自哪里吗?
library(caret)
c_1 <- trainControl(method = "none")
maxvar <-(4)
direction <-"forward"
tune_1 <-data.frame(maxvar,direction)
train(Species~., data=iris, method = "stepLDA", trControl=c_1, tuneGrid=tune_1)->tr
第一
`stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
150 observations of 4 variables in 3 classes; direction: forward
stop criterion: assemble 4 best variables.
correctness rate: 0.96; in: "Petal.Width"; variables (1): Petal.Width
correctness rate: 0.96667; in: "Sepal.Width"; variables (2): Petal.Width, Sepal.Width
correctness rate: 0.97333; in: "Petal.Length"; variables (3): Petal.Width, Sepal.Width, Petal.Length
correctness rate: 0.98; in: "Sepal.Length"; variables (4): Petal.Width, Sepal.Width, Petal.Length, Sepal.Length
hr.elapsed min.elapsed sec.elapsed
0.00 0.00 0.28
第二
> train(Species~., data=iris, method = "stepLDA", trControl=c_1, tuneGrid=tune_1)->tr
`stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
150 observations of 4 variables in 3 classes; direction: forward
stop criterion: assemble 4 best variables.
correctness rate: 0.96; in: "Petal.Width"; variables (1): Petal.Width
correctness rate: 0.96; in: "Sepal.Width"; variables (2): Petal.Width, Sepal.Width
correctness rate: 0.96667; in: "Petal.Length"; variables (3): Petal.Width, Sepal.Width, Petal.Length
correctness rate: 0.98; in: "Sepal.Length"; variables (4): Petal.Width, Sepal.Width, Petal.Length, Sepal.Length
hr.elapsed min.elapsed sec.elapsed
0.0 0.0 0.3
答案 0 :(得分:2)
您仍在进行10倍交叉验证。只要您没有设置种子,当您多次训练模型时,您将始终得到稍微不同的答案。
如果您运行这段代码,包括set.seed,您将获得相同的正确率。
set.seed(42)
tr <- train(Species~., data=iris, method = "stepLDA", trControl=c_1, tuneGrid=tune_1)
10倍交叉验证的正确率不是来自Caret,而是来自klaR包的stepclass函数。
stepclass(x,分组,方法,改进= 0.05,maxvar = Inf, start.vars = NULL,direction = c(“both”,“forward”,“backward”), criterion =“CR”, fold = 10 ,cv.groups = NULL,output = TRUE, min1var = TRUE,...)
用于交叉验证的折叠参数;如果'cv.groups'是,则省略 指定。
如果您只想将fold参数添加到列车功能,则可以调整此项:
tr <- train(Species~., data=iris, method = "stepLDA", trControl=c_1, tuneGrid=tune_1, fold = 1)
但是1的折叠毫无意义。你会得到一堆警告和错误。