PCA for KNN:插入符号中的预处理参数

时间:2019-04-23 13:07:09

标签: r machine-learning pca knn feature-selection

我正在对我的数据进行knn回归,并希望:

a)通过重复cv交叉验证以找到最佳k;

b)在构建knn模型时,以90%级别阈值使用PCA来降低尺寸。

library(caret)
library(dplyr)
set.seed(0)
data = cbind(rnorm(15, 100, 10), matrix(rnorm(300, 10, 5), ncol = 20)) %>% 
  data.frame()

colnames(data) = c('True', paste0('Day',1:20))
tr = data[1:10, ] #training set
tt = data[11:15,] #test set

train.control = trainControl(method = "repeatedcv", number = 5, repeats=3)
k = train(True ~ .,
          method     = "knn",
          tuneGrid   = expand.grid(k = 1:10),
          trControl  = train.control, 
          preProcess = c('scale','pca'),
          metric     = "RMSE",
          data       = tr)

我的问题是:当前PCA阈值默认为95%(不确定),如何将其更改为80%?

1 个答案:

答案 0 :(得分:0)

您可以尝试在trainControl中添加preProcOptions参数

train.control = trainControl(method = "repeatedcv", number = 5, repeats=3, preProcOptions = list(thresh = 0.80))