Question

我在这里做错了什么？ “下标超出界限”是什么意思？

我得到了以下代码（第一块）摘录形式的革命R在线研讨会关于R中的数据挖掘。我试图将其纳入我运行的RF模型但是无法通过我认为的顺序变量。我只是想描绘变量的重要性。

我在下面提供了一些更多内容以提供背景信息。但实际上我错误的是第三行代码。第二个代码块是我应用于我正在使用的数据时得到的错误。任何人都可以帮我解决这个问题吗？

    -------------------------------------------------------------------------
# List the importance of the variables.
rn <- round(importance(model.rf), 2)
rn[order(rn[,3], decreasing=TRUE),]
##@# of 
# Plot variable importance
varImpPlot(model.rf, main="",col="dark blue")
title(main="Variable Importance Random Forest weather.csv",
            sub=paste(format(Sys.time(), "%Y-%b-%d %H:%M:%S"), Sys.info()["user"])) 
#--------------------------------------------------------------------------

我的错误：

> rn[order(rn[,2], decreasing=TRUE),]
Error in order(rn[, 2], decreasing = TRUE) : subscript out of bounds

Answer 1

想想我理解这种困惑。我打赌你用4指Kit Kat，如果你输入ncol(rn)，你会看到rn有2列，而不是你想象的3列。您在屏幕上看到的第一个“列”实际上不是列 - 它只是对象rn的行名。输入rownames(rn)进行确认。因此，您想要订购的rn的最后一列是rn [，2]而不是rn [，3]。出现“下标越界”消息，因为您已要求R按第3列排序，但是rn没有第3列。

这是我对那些对“重要性”对象实际上是什么感兴趣的人的简短侦探线索......我安装了库（randomForest），然后在线文档中运行了一个示例：

set.seed(4543)
data(mtcars)
mtcars.rf <- randomForest(mpg ~ ., data=mtcars, ntree=1000, 
             keep.forest=FALSE, importance=TRUE)
importance(mtcars.rf)

在这种情况下，“重要性”对象看起来像这样（前几行只是为了节省空间）：

       %IncMSE IncNodePurity
cyl  17.058932     181.70840
disp 19.203139     242.86776
hp   17.708221     191.15919
...

显然ncol（重要性（mtcars.rf））是2，行名可能会导致混淆：）

在R Plot随机森林模型的重要性变量

1 个答案: