插入方法=“rf”警告消息:无效## mtry:重置到有效范围内

时间:2018-03-09 03:44:52

标签: r random-forest r-caret

我正在开发Coursera机器学习项目。目标是为以下数据集执行预测建模。

> summary(training)
   roll_belt        pitch_belt          yaw_belt       total_accel_belt  gyros_belt_x      
 Min.   :-28.90   Min.   :-55.8000   Min.   :-180.00   Min.   : 0.00    Min.   :-1.040000  
 1st Qu.:  1.10   1st Qu.:  1.7600   1st Qu.: -88.30   1st Qu.: 3.00    1st Qu.:-0.030000  
 Median :113.00   Median :  5.2800   Median : -13.00   Median :17.00    Median : 0.030000  
 Mean   : 64.41   Mean   :  0.3053   Mean   : -11.21   Mean   :11.31    Mean   :-0.005592  
 3rd Qu.:123.00   3rd Qu.: 14.9000   3rd Qu.:  12.90   3rd Qu.:18.00    3rd Qu.: 0.110000  
 Max.   :162.00   Max.   : 60.3000   Max.   : 179.00   Max.   :29.00    Max.   : 2.220000  
  gyros_belt_y       gyros_belt_z      accel_belt_x       accel_belt_y     accel_belt_z     magnet_belt_x  
 Min.   :-0.64000   Min.   :-1.4600   Min.   :-120.000   Min.   :-69.00   Min.   :-275.00   Min.   :-52.0  
 1st Qu.: 0.00000   1st Qu.:-0.2000   1st Qu.: -21.000   1st Qu.:  3.00   1st Qu.:-162.00   1st Qu.:  9.0  
 Median : 0.02000   Median :-0.1000   Median : -15.000   Median : 35.00   Median :-152.00   Median : 35.0  
 Mean   : 0.03959   Mean   :-0.1305   Mean   :  -5.595   Mean   : 30.15   Mean   : -72.59   Mean   : 55.6  
 3rd Qu.: 0.11000   3rd Qu.:-0.0200   3rd Qu.:  -5.000   3rd Qu.: 61.00   3rd Qu.:  27.00   3rd Qu.: 59.0  
 Max.   : 0.64000   Max.   : 1.6200   Max.   :  85.000   Max.   :164.00   Max.   : 105.00   Max.   :485.0  
 magnet_belt_y   magnet_belt_z       roll_arm         pitch_arm          yaw_arm          total_accel_arm
 Min.   :354.0   Min.   :-623.0   Min.   :-180.00   Min.   :-88.800   Min.   :-180.0000   Min.   : 1.00  
 1st Qu.:581.0   1st Qu.:-375.0   1st Qu.: -31.77   1st Qu.:-25.900   1st Qu.: -43.1000   1st Qu.:17.00  
 Median :601.0   Median :-320.0   Median :   0.00   Median :  0.000   Median :   0.0000   Median :27.00  
 Mean   :593.7   Mean   :-345.5   Mean   :  17.83   Mean   : -4.612   Mean   :  -0.6188   Mean   :25.51  
 3rd Qu.:610.0   3rd Qu.:-306.0   3rd Qu.:  77.30   3rd Qu.: 11.200   3rd Qu.:  45.8750   3rd Qu.:33.00  
 Max.   :673.0   Max.   : 293.0   Max.   : 180.00   Max.   : 88.500   Max.   : 180.0000   Max.   :66.00  
  gyros_arm_x        gyros_arm_y       gyros_arm_z       accel_arm_x       accel_arm_y    
 Min.   :-6.37000   Min.   :-3.4400   Min.   :-2.3300   Min.   :-404.00   Min.   :-318.0  
 1st Qu.:-1.33000   1st Qu.:-0.8000   1st Qu.:-0.0700   1st Qu.:-242.00   1st Qu.: -54.0  
 Median : 0.08000   Median :-0.2400   Median : 0.2300   Median : -44.00   Median :  14.0  
 Mean   : 0.04277   Mean   :-0.2571   Mean   : 0.2695   Mean   : -60.24   Mean   :  32.6  
 3rd Qu.: 1.57000   3rd Qu.: 0.1400   3rd Qu.: 0.7200   3rd Qu.:  84.00   3rd Qu.: 139.0  
 Max.   : 4.87000   Max.   : 2.8400   Max.   : 3.0200   Max.   : 437.00   Max.   : 308.0  
  accel_arm_z       magnet_arm_x     magnet_arm_y     magnet_arm_z    roll_dumbbell     pitch_dumbbell   
 Min.   :-636.00   Min.   :-584.0   Min.   :-392.0   Min.   :-597.0   Min.   :-153.71   Min.   :-149.59  
 1st Qu.:-143.00   1st Qu.:-300.0   1st Qu.:  -9.0   1st Qu.: 131.2   1st Qu.: -18.49   1st Qu.: -40.89  
 Median : -47.00   Median : 289.0   Median : 202.0   Median : 444.0   Median :  48.17   Median : -20.96  
 Mean   : -71.25   Mean   : 191.7   Mean   : 156.6   Mean   : 306.5   Mean   :  23.84   Mean   : -10.78  
 3rd Qu.:  23.00   3rd Qu.: 637.0   3rd Qu.: 323.0   3rd Qu.: 545.0   3rd Qu.:  67.61   3rd Qu.:  17.50  
 Max.   : 292.00   Max.   : 782.0   Max.   : 583.0   Max.   : 694.0   Max.   : 153.55   Max.   : 149.40  
  yaw_dumbbell      total_accel_dumbbell gyros_dumbbell_x    gyros_dumbbell_y   gyros_dumbbell_z 
 Min.   :-150.871   Min.   : 0.00        Min.   :-204.0000   Min.   :-2.10000   Min.   : -2.380  
 1st Qu.: -77.644   1st Qu.: 4.00        1st Qu.:  -0.0300   1st Qu.:-0.14000   1st Qu.: -0.310  
 Median :  -3.324   Median :10.00        Median :   0.1300   Median : 0.03000   Median : -0.130  
 Mean   :   1.674   Mean   :13.72        Mean   :   0.1611   Mean   : 0.04606   Mean   : -0.129  
 3rd Qu.:  79.643   3rd Qu.:19.00        3rd Qu.:   0.3500   3rd Qu.: 0.21000   3rd Qu.:  0.030  
 Max.   : 154.952   Max.   :58.00        Max.   :   2.2200   Max.   :52.00000   Max.   :317.000  
 accel_dumbbell_x  accel_dumbbell_y  accel_dumbbell_z  magnet_dumbbell_x magnet_dumbbell_y
 Min.   :-419.00   Min.   :-189.00   Min.   :-334.00   Min.   :-643.0    Min.   :-3600    
 1st Qu.: -50.00   1st Qu.:  -8.00   1st Qu.:-142.00   1st Qu.:-535.0    1st Qu.:  231    
 Median :  -8.00   Median :  41.50   Median :  -1.00   Median :-479.0    Median :  311    
 Mean   : -28.62   Mean   :  52.63   Mean   : -38.32   Mean   :-328.5    Mean   :  221    
 3rd Qu.:  11.00   3rd Qu.: 111.00   3rd Qu.:  38.00   3rd Qu.:-304.0    3rd Qu.:  390    
 Max.   : 235.00   Max.   : 315.00   Max.   : 318.00   Max.   : 592.0    Max.   :  633    
 magnet_dumbbell_z  roll_forearm       pitch_forearm     yaw_forearm      total_accel_forearm
 Min.   :-262.00   Min.   :-180.0000   Min.   :-72.50   Min.   :-180.00   Min.   :  0.00     
 1st Qu.: -45.00   1st Qu.:  -0.7375   1st Qu.:  0.00   1st Qu.: -68.60   1st Qu.: 29.00     
 Median :  13.00   Median :  21.7000   Median :  9.24   Median :   0.00   Median : 36.00     
 Mean   :  46.05   Mean   :  33.8265   Mean   : 10.71   Mean   :  19.21   Mean   : 34.72     
 3rd Qu.:  95.00   3rd Qu.: 140.0000   3rd Qu.: 28.40   3rd Qu.: 110.00   3rd Qu.: 41.00     
 Max.   : 452.00   Max.   : 180.0000   Max.   : 89.80   Max.   : 180.00   Max.   :108.00     
 gyros_forearm_x   gyros_forearm_y     gyros_forearm_z    accel_forearm_x   accel_forearm_y 
 Min.   :-22.000   Min.   : -7.02000   Min.   : -8.0900   Min.   :-498.00   Min.   :-632.0  
 1st Qu.: -0.220   1st Qu.: -1.46000   1st Qu.: -0.1800   1st Qu.:-178.00   1st Qu.:  57.0  
 Median :  0.050   Median :  0.03000   Median :  0.0800   Median : -57.00   Median : 201.0  
 Mean   :  0.158   Mean   :  0.07517   Mean   :  0.1512   Mean   : -61.65   Mean   : 163.7  
 3rd Qu.:  0.560   3rd Qu.:  1.62000   3rd Qu.:  0.4900   3rd Qu.:  76.00   3rd Qu.: 312.0  
 Max.   :  3.970   Max.   :311.00000   Max.   :231.0000   Max.   : 477.00   Max.   : 923.0  
 accel_forearm_z   magnet_forearm_x  magnet_forearm_y magnet_forearm_z classe  
 Min.   :-446.00   Min.   :-1280.0   Min.   :-896.0   Min.   :-973.0   A:5580  
 1st Qu.:-182.00   1st Qu.: -616.0   1st Qu.:   2.0   1st Qu.: 191.0   B:3797  
 Median : -39.00   Median : -378.0   Median : 591.0   Median : 511.0   C:3422  
 Mean   : -55.29   Mean   : -312.6   Mean   : 380.1   Mean   : 393.6   D:3216  
 3rd Qu.:  26.00   3rd Qu.:  -73.0   3rd Qu.: 737.0   3rd Qu.: 653.0   E:3607  
 Max.   : 291.00   Max.   :  672.0   Max.   :1480.0   Max.   :1090.0           

为了训练模型,我做了以下工作:

trainCtrl <- trainControl(method = "cv", number = 10, savePredictions = TRUE)
rfModel <- train(classe ~., method = "rf", trControl = trainCtrl, preProcess = "pca", data = training, prox = TRUE)

模型有效。但是,我对多次警告信息感到非常恼火,重复多达20次,invalid mtry: reset to within valid range。 Google上的一些搜索没有返回任何有用的见解。此外,不确定是否重要,数据集中没有NA值;它们在之前的步骤中被移除。

我也运行了system.time(),处理时间超过1小时。

> system.time(train(classe ~., method = "rf", trControl = trainCtrl, preProcess = "pca", data = training, prox = TRUE))
    user   system  elapsed 
6478.113  302.281 7044.483 

如果你能帮助破译这个警告信息的内容和原因,那将是超级的。我很想听到有关这么长的处理时间的任何意见。

谢谢!

1 个答案:

答案 0 :(得分:4)

caret rf方法使用randomForest包中的randomForest函数。如果将mtry randomForest参数设置为大于预测变量数的值,则会收到您发布的警告(例如,尝试rf = randomForest(mpg ~ ., mtry=15, data=mtcars))。该模型仍在运行,但randomForestmtry设置为较低的有效值。

问题是,为什么train(或其调用的功能之一)为randomForest提供的mtry值过大?我不确定,但是这里猜测:设置preProcess="pca"可以减少馈送到randomForest的功能数量(相对于原始数据中的功能数量),因为丢弃最不重要的主成分以减少特征集的维数。但是,在进行交叉验证时,train可能会根据原始数据中较大数量的要素设置mtry的最大randomForest值,而不是基于实际馈送到randomForest的预处理数据集。对此的一般证据是,如果您删除preProcess="pca"参数,警告就会消失,但我没有进一步检查。

可重现的代码显示警告在没有pca的情况下消失:

trainCtrl <- trainControl(method = "cv", number = 10, savePredictions = TRUE)
rfModel <- train(mpg ~., method = "rf", trControl = trainCtrl, preProcess = "pca", data = mtcars, prox = TRUE)
rfModel <- train(mpg ~., method = "rf", trControl = trainCtrl, data = mtcars, prox = TRUE)