Caret跃进线性回归(特征选择)变化nvmax

时间:2018-02-14 20:50:37

标签: r parameters r-caret feature-selection

我一直在使用leapForward方法从包跳跃与插入符号一起跳过,发现它只提供了5个变量。根据跳跃包,您可以将nvmax更改为您希望的任何数量的子集。

我似乎无法将其放入插入符号封装器中。我已经尝试将它放入火车声明,以及创建expand.grid线,并且ti似乎不起作用。任何帮助将不胜感激!

我的代码:

library(caret)        
data <- read.csv(file="C:/mydata.csv", header=TRUE, sep=",")
fitControl <- trainControl(method = "loocv")
x <- data[, -19]
y <- data[, 19]
lmFit <- train(x=x, y=y,'leapForward', trControl = fitControl)
summary(lmFit)

1 个答案:

答案 0 :(得分:0)

插入符的默认行为是对调整参数的随机搜索。 您可以使用tuneGrid选项指定参数网格。

以下是BloodBrain数据集的可重现示例。 NB:我必须这样做 用PCA转换预测变量以避免多线性问题

library(caret)
data(BloodBrain, package = "caret")
dim(bbbDescr)
#> [1] 208 134
X <- princomp(bbbDescr)$scores[,1:131]
Y <- logBBB
fitControl <- trainControl(method = "cv")

默认:随机搜索参数

lmFit <- train(y = Y, x = X,'leapForward', trControl = fitControl)
lmFit
#> Linear Regression with Forward Selection 
#> 
#> 208 samples
#> 131 predictors
#> 
#> No pre-processing
#> Resampling: Cross-Validated (10 fold) 
#> Summary of sample sizes: 187, 188, 187, 187, 187, 187, ... 
#> Resampling results across tuning parameters:
#> 
#>   nvmax  RMSE       Rsquared   MAE      
#>   2      0.6682545  0.2928583  0.5286758
#>   3      0.7008359  0.2652202  0.5527730
#>   4      0.6781190  0.3026475  0.5215527
#> 
#> RMSE was used to select the optimal model using the smallest value.
#> The final value used for the model was nvmax = 2.

使用您选择的网格搜索。
注意:此处不需要expand.grid。它结合起来很有用 几个调整参数

lmFit <- train(y = Y, x = X,'leapForward', trControl = fitControl, 
               tuneGrid = expand.grid(nvmax = seq(1, 30, 2)))
lmFit
#> Linear Regression with Forward Selection 
#> 
#> 208 samples
#> 131 predictors
#> 
#> No pre-processing
#> Resampling: Cross-Validated (10 fold) 
#> Summary of sample sizes: 188, 188, 188, 186, 187, 187, ... 
#> Resampling results across tuning parameters:
#> 
#>   nvmax  RMSE       Rsquared    MAE      
#>    1     0.7649633  0.07840817  0.5919515
#>    3     0.6952295  0.27147443  0.5250173
#>    5     0.6482456  0.35953363  0.4828406
#>    7     0.6509919  0.37800159  0.4865292
#>    9     0.6721529  0.35899937  0.5104467
#>   11     0.6541945  0.39316037  0.4979497
#>   13     0.6355383  0.42654189  0.4794705
#>   15     0.6493433  0.41823974  0.4911399
#>   17     0.6645519  0.37338055  0.5105887
#>   19     0.6575950  0.39628133  0.5084652
#>   21     0.6663806  0.39156852  0.5124487
#>   23     0.6744933  0.38746853  0.5143484
#>   25     0.6709936  0.39228681  0.5025907
#>   27     0.6919163  0.36565876  0.5209107
#>   29     0.7015347  0.35397968  0.5272448
#> 
#> RMSE was used to select the optimal model using the smallest value.
#> The final value used for the model was nvmax = 13.
plot(lmFit)

reprex package(v0.2.0)创建于2018-03-08。