R randomForest important

时间:2016-03-09 20:16:02

标签: r random-forest

似乎在R中,如果我们适合randomForest并试图找出重要性,我们会得到不同的结果,具体取决于我们是使用$ important还是%IncMSE的函数重要性。不确定为什么?

> rf.model = randomForest(Weight ~ ., data = XYWeightTrain, importance = TRUE, ntree = 500, xtest = XYWeightTest[, -1], ytest = XYWeightTest[, 1])
> rf.model$importance
                       %IncMSE IncNodePurity
Wrist.Diam           0.8212594      305.3484
Wrist.Girth          2.8674595     1244.2349
Forearm.Girth       14.7491374     6681.7611
Elbow.Diam           1.0207908      427.8362
Bicep.Girth          7.9351242     4636.7848
Shoulder.Girth       9.5574023     5108.2292
Biacromial.Diam      0.9785278      347.9064
Chest.Depth          2.0081676      873.7349
Chest.Diam           1.9936859     1330.1593
Chest.Girth         24.2663570     9815.0322
Navel.Girth          2.1440752      648.8285
Waist.Girth         31.6001879    12512.4992
Pelvic.Breadth       0.5893632      227.1361
Bitrochanteric.Diam  1.1661954      346.4844
Hip.Girth            8.0548212     2178.3831
Thigh.Girth          2.8990134      726.3200
Knee.Diam            3.6684350     1207.6730
Knee.Girth           6.3831885     2258.2849
Calf.Girth           5.6392469     1972.3754
Ankle.Diam           0.7002560      199.7919
Ankle.Girth          2.0712253      684.4244
> importance(rf.model)
                      %IncMSE IncNodePurity
Wrist.Diam           7.541535      305.3484
Wrist.Girth          9.240727     1244.2349
Forearm.Girth       12.534953     6681.7611
Elbow.Diam           8.742194      427.8362
Bicep.Girth          9.966211     4636.7848
Shoulder.Girth      11.263877     5108.2292
Biacromial.Diam      6.680291      347.9064
Chest.Depth         10.196696      873.7349
Chest.Diam           6.846195     1330.1593
Chest.Girth         15.979216     9815.0322
Navel.Girth         12.194066      648.8285
Waist.Girth         20.320096    12512.4992
Pelvic.Breadth       6.575887      227.1361
Bitrochanteric.Diam  9.568542      346.4844
Hip.Girth           20.481270     2178.3831
Thigh.Girth         15.100160      726.3200
Knee.Diam           16.784600     1207.6730
Knee.Girth          19.353398     2258.2849
Calf.Girth          18.927534     1972.3754
Ankle.Diam           6.360296      199.7919
Ankle.Girth          9.868660      684.4244 

1 个答案:

答案 0 :(得分:3)

如果您只是从模型中打印重要性对象,则它们是原始重要性值。但是,当您使用重要性函数时,scale参数的默认值为TRUE,它返回重要性值除以标准误差。如果您使用importance(rf.model scale=FALSE),则值应相同。

我强烈建议使用%IncMSE而不是GINI(IncNodePurity)。 %IncMSE在节点处被置换,并且是变量重要性的更稳定表示。