Question

我是r的新手。我试图在r中使用随机森林找出变量的重要性。我正在使用包randomForest()。在下面的代码中，train是训练集，trainy是目标变量。目标变量表示我分别编码为0,1,2的三个类。

require(randomForest)
train<-read.csv('TrData.csv',header=FALSE,sep=",")
trainy<-read.csv('TrData_y.csv',header=FALSE,sep=",")
trainy <- data.frame(lapply(trainy, factor))
set.seed(415)
fit <- randomForest(train,trainy,importance=TRUE, ntree=1000,proximity=TRUE)

现在，当我运行此代码并显示summary(fit)时，结果如下。

 summary(fit)
                   Length  Class  Mode     
 call                  4 -none- call     
 type                  1 -none- character
 predicted             0 -none- NULL     
 err.rate              0 -none- NULL     
 confusion             0 -none- NULL     
 votes              5580 matrix numeric  
 oob.times          2790 -none- numeric  
 classes               2 -none- character
 importance         4100 -none- numeric  
 importanceSD       3075 -none- numeric  
 localImportance       0 -none- NULL     
 proximity       7784100 -none- numeric  
 ntree                 1 -none- numeric  
 mtry                  1 -none- numeric  
 forest                0 -none- NULL     
 y                     0 -none- NULL     
 test                  0 -none- NULL     
 inbag                 0 -none- NULL     
 > sapply(trainy,is.factor)
   V1 
   TRUE

classes显示2但有3个类，ntree显示1，但在参数中我指定了1000棵树来构建林。我不知道我在代码中做错了什么。当我发现这个问题时，我发现了另一个问题：https://stats.stackexchange.com/questions/34363/randomforest-chooses-regression-instead-of-classification。所以我知道目标变量是否被认为是因素并且没有问题。我还使用importance来提取变量重要性

featimpt=importance(fit)
write.csv(file="featureimportance_R.csv",x=featimpt)

这给出了如下

的输出

仅提供2个类的变量重要性。

请告诉我这里的代码我做错了什么。任何帮助，将不胜感激。谢谢。

randomForest不识别3个类

0 个答案: