插入序列预处理中的相关截止

时间:2018-06-14 08:34:17

标签: r correlation r-caret preprocessor

我正在r。

中使用插入符号包构建C5.0模型
control <- trainControl(method = "repeatedcv", 
                    number = 10, 
                    repeats = 3, 
                    classProbs = TRUE, 
                    sampling = 'smote',
                    returnResamp="all",
                    summaryFunction = twoClassSummary)

grid <- expand.grid(.winnow = c(FALSE, TRUE), 
                 .trials = c(1, 5,10,15,20,25,30,40,45,50), 
                 .model= c("tree"),
                 .splits=c(2,5,10,15,20,25,50))

c5_model <- train(label ~ .,
              data = train,
              trControl = control, 
              method = c5info,
              tuneGrid = grid, 
              preProcess = c("center", "scale", "nzv","corr"),
              verbose = FALSE)

是否可以将自定义截止点传递给preProcess函数以进行相关 - 比如0.75或我想要的任何点?

1 个答案:

答案 0 :(得分:1)

您可以在ID VERSION FEATURE STARTDATE ENDDATE 1 0.100000 A 01-01-2018 15-03-2018 2 0.100000 B 01-01-2018 15-03-2018 3 0.100000 C 01-01-2018 15-03-2018 4 0.200000 A 15-03-2018 9999-12-31 5 0.200000 B 15-03-2018 9999-12-31 6 0.200000 D 15-03-2018 9999-12-31 中指定预处理选项:

trainControl

一些游侠模型:

library(caret)
library(mlbench) #for the data
data(Sonar)

ctrl <-trainControl(method = "repeatedcv", 
                    number = 10, 
                    repeats = 3, 
                    classProbs = TRUE, 
                    sampling = 'smote',
                    returnResamp="all",
                    summaryFunction = twoClassSummary,
                    preProcOptions = list(cutoff = 0.75)) # all go in this list

使用不同的截止值:

grid <- expand.grid(.mtry = c(2,5,10),
                    .min.node.size = 2,
                    .splitrule = "gini")

fit_model <- train(Class ~ .,
                  data = Sonar,
                  trControl = ctrl, 
                  metric = "ROC",
                  method = "ranger",
                  tuneGrid = grid,
                  preProcess = c("center", "scale", "nzv","corr"),
                  verbose = FALSE)

fit_model$preProcess
#output
Created from 679 samples and 60 variables

Pre-processing:
  - centered (26)
  - ignored (0)
  - removed (34)
  - scaled (26)

删除了更多列

当我们使用ctrl2 <-trainControl(method = "repeatedcv", number = 10, repeats = 3, classProbs = TRUE, sampling = 'smote', returnResamp="all", summaryFunction = twoClassSummary, preProcOptions = list(cutoff = 0.6)) fit_model2 <- train(Class ~ ., data = Sonar, trControl = ctrl2, metric = "ROC", method = "ranger", tuneGrid = grid, preProcess = c("center", "scale", "nzv","corr"), verbose = FALSE) fit_model2$preProcess #output Created from 679 samples and 60 variables Pre-processing: - centered (23) - ignored (0) - removed (37) - scaled (23)

preProcOptions = list(cutoff = 0.95))

看起来很有效。

同样,您可以传递任何其他预处理选项:

fit_model3$preProcess
#output
Created from 679 samples and 60 variables

Pre-processing:
  - centered (55)
  - ignored (0)
  - removed (5)
  - scaled (55)

检查所有这些