Question

我正在对我的数据进行knn回归，并希望：

a）通过repeatedcv进行交叉验证以找到最优的k；

b）在构建knn模型时，使用PCA级阈值的90%来降低尺寸。

library(caret)
library(dplyr)
set.seed(0)
data = cbind(rnorm(20, 100, 10), matrix(rnorm(400, 10, 5), ncol = 20)) %>% 
  data.frame()
colnames(data) = c('True', paste0('Day',1:20))
tr = data[1:15, ] #training set
tt = data[16:20,] #test set

train.control = trainControl(method = "repeatedcv", number = 5, repeats=3)
k = train(True ~ .,
          method     = "knn",
          tuneGrid   = expand.grid(k = 1:10), 
          #trying to find the optimal k from 1:10
          trControl  = train.control, 
          preProcess = c('scale','pca'),
          metric     = "RMSE",
          data       = tr)

我的问题：

（1）我注意到someone建议在trainControl中更改pca参数：

ctrl <- trainControl(preProcOptions = list(thresh = 0.8))
mod <- train(Class ~ ., data = Sonar, method = "pls",
              trControl = ctrl)

如果我更改trainControl中的参数，是否表示在KNN期间PCA仍在进行？ Similar concern as this question

（2）我找到了另一个适合我的情况的example-我希望将阈值更改为90％，但是我不知道在Caret的{{ 1}}函数，尤其是我仍然需要train选项。

对于冗长的描述和随机引用，我深表歉意。先感谢您！

（感谢Camille提出的使代码正常运行的建议！）

Answer 1

要回答您的问题：

我注意到有人建议在以下位置更改pca参数 trainControl：

mod <- train(Class ~ ., data = Sonar, method = "pls",trControl = ctrl)

如果我更改trainControl中的参数，是否表示PCA是还在KNN期间进行？

是的，如果您这样做：

train.control = trainControl(method = "repeatedcv", number = 5, repeats=3,preProcOptions = list(thresh = 0.9))

k = train(True ~ .,
          method     = "knn",
          tuneGrid   = expand.grid(k = 1:10), 
          trControl  = train.control, 
          preProcess = c('scale','pca'),
          metric     = "RMSE",
          data       = tr)

您可以在预处理下进行检查：

k$preProcess
Created from 15 samples and 20 variables

Pre-processing:
  - centered (20)
  - ignored (0)
  - principal component signal extraction (20)
  - scaled (20)

PCA needed 9 components to capture 90 percent of the variance

这将回答2），这是单独使用preProcess：

mdl = preProcess(tr[,-1],method=c("scale","pca"),thresh=0.9)
mdl
Created from 15 samples and 20 variables

Pre-processing:
  - centered (20)
  - ignored (0)
  - principal component signal extraction (20)
  - scaled (20)

PCA needed 9 components to capture 90 percent of the variance

train.control = trainControl(method = "repeatedcv", number = 5, repeats=3)

k = train(True ~ .,
          method     = "knn",
          tuneGrid   = expand.grid(k = 1:10), 
          trControl  = train.control,
          metric     = "RMSE",
          data       = predict(mdl,tr))

插入符号的训练函数中的PCA预处理参数

1 个答案: