Question

我想创建部分响应图，如here所示。我训练我的randomForest模型如下（总共12个特征和1个类变量）：

fit <- randomForest(as.factor(Y) ~ TIME_1 + TIME_2 + TIME_3 + DURATION_1 + DURATION_2 + DURATION_3 +
                    VALUE_1 + VALUE_2 + VALUE_3 +
                    Weekday_1 + Weekday_2 + Weekday_3,
                    data=train, 
                    importance=TRUE, 
                    ntree=50)

然后我运行此代码来获取绘图，但看起来无法检测到变量名称。特别是，出于某种原因，importanceOrder会返回102之类的值，而我只有12个要素。

importanceOrder=order(-fit$importance)
importanceOrder
  [1] 102 108 101 107 111 129 117 109 100 132 106 110 105 118 122 127 104 130 123 125 103 124 121
 [24] 116 115 119 120 126 131 128 112 113 114  36  42  45  35  41  38  63  69  66  34  68  44  75
 [47]  74  64  61  58  96  43  99  78  30   2  33  67  37   8  49   1  40  71   3  76  50  73   7
 [70]  10  91  51  94   9  97  70  77  25  83  27  28  53   4  82  39  31  59  17  84  93  19  18
 [93]   5  92  26  16  85  86  54  11  72  29  20  95  55  56  87  88  22  24  90  89  21  23  48
[116]  46  57  79  81  32  13   6  15  14  98  80  12  65  47  62  52  60

names=rownames(fit$importance)[importanceOrder][1:15]
names
 [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

par(mfrow=c(5, 3), xpd=NA)
for (name in names)
+ partialPlot(fit, train, eval(name), main=name, xlab=name,ylim=c(-.2,.9))

Error in `[.data.frame`(pred.data, , xname) : undefined columns selected

Answer 1

我认为，如果你看一下fit$importance的结构，它会更清楚一点，为什么它不起作用。

您只想根据fit$importance的最后一栏订购，而不是整个数组。

library(randomForest)
fit <- randomForest(Species ~ Sepal.Length + Sepal.Width + 
                      Petal.Length + Petal.Width, 
                    data=iris, importance=T, ntree=50)

fit$importance

# setosa versicolor  virginica MeanDecreaseAccuracy MeanDecreaseGini
# Sepal.Length 0.05176523 0.03398421 0.05009963           0.04412921        12.464634
# Sepal.Width  0.01846554 0.01564288 0.01006655           0.01486503         3.512521
# Petal.Length 0.23199887 0.23484289 0.33840220           0.27046565        38.386311
# Petal.Width  0.41265955 0.30366844 0.26475770           0.32568744        44.906934

importanceOrder<-order(-fit$importance[,'MeanDecreaseGini'])

names<-rownames(fit$importance)[importanceOrder][1:4]

par(mfrow=c(2, 2), xpd=NA)
for (name in names) partialPlot(fit, iris, eval(name), main=name, xlab=name)

创建部分响应图：无法确定要素的名称

1 个答案: