似乎在R中,如果我们适合randomForest并试图找出重要性,我们会得到不同的结果,具体取决于我们是使用$ important还是%IncMSE的函数重要性。不确定为什么?
> rf.model = randomForest(Weight ~ ., data = XYWeightTrain, importance = TRUE, ntree = 500, xtest = XYWeightTest[, -1], ytest = XYWeightTest[, 1])
> rf.model$importance
%IncMSE IncNodePurity
Wrist.Diam 0.8212594 305.3484
Wrist.Girth 2.8674595 1244.2349
Forearm.Girth 14.7491374 6681.7611
Elbow.Diam 1.0207908 427.8362
Bicep.Girth 7.9351242 4636.7848
Shoulder.Girth 9.5574023 5108.2292
Biacromial.Diam 0.9785278 347.9064
Chest.Depth 2.0081676 873.7349
Chest.Diam 1.9936859 1330.1593
Chest.Girth 24.2663570 9815.0322
Navel.Girth 2.1440752 648.8285
Waist.Girth 31.6001879 12512.4992
Pelvic.Breadth 0.5893632 227.1361
Bitrochanteric.Diam 1.1661954 346.4844
Hip.Girth 8.0548212 2178.3831
Thigh.Girth 2.8990134 726.3200
Knee.Diam 3.6684350 1207.6730
Knee.Girth 6.3831885 2258.2849
Calf.Girth 5.6392469 1972.3754
Ankle.Diam 0.7002560 199.7919
Ankle.Girth 2.0712253 684.4244
> importance(rf.model)
%IncMSE IncNodePurity
Wrist.Diam 7.541535 305.3484
Wrist.Girth 9.240727 1244.2349
Forearm.Girth 12.534953 6681.7611
Elbow.Diam 8.742194 427.8362
Bicep.Girth 9.966211 4636.7848
Shoulder.Girth 11.263877 5108.2292
Biacromial.Diam 6.680291 347.9064
Chest.Depth 10.196696 873.7349
Chest.Diam 6.846195 1330.1593
Chest.Girth 15.979216 9815.0322
Navel.Girth 12.194066 648.8285
Waist.Girth 20.320096 12512.4992
Pelvic.Breadth 6.575887 227.1361
Bitrochanteric.Diam 9.568542 346.4844
Hip.Girth 20.481270 2178.3831
Thigh.Girth 15.100160 726.3200
Knee.Diam 16.784600 1207.6730
Knee.Girth 19.353398 2258.2849
Calf.Girth 18.927534 1972.3754
Ankle.Diam 6.360296 199.7919
Ankle.Girth 9.868660 684.4244
答案 0 :(得分:3)
如果您只是从模型中打印重要性对象,则它们是原始重要性值。但是,当您使用重要性函数时,scale参数的默认值为TRUE,它返回重要性值除以标准误差。如果您使用importance(rf.model scale=FALSE)
,则值应相同。
我强烈建议使用%IncMSE而不是GINI(IncNodePurity)。 %IncMSE在节点处被置换,并且是变量重要性的更稳定表示。