变量重要性R.

时间:2018-04-27 22:47:19

标签: r variables random

我遇到了R中具有变量重要性的问题。它打印重要性但不包括变量名。我无法弄清楚左列中索引的位置。下面是代码和输出。

我有以下形式的一组数据,除了我有192个变量和10,000个观察值。第2-24列是连续的,其余的是分类的。

更新:我运行相同的代码而不将分类变量更改为因子。调用varimp时,它现在打印相应的变量名称。当我将变量更改为分类

时,有谁知道为什么这不起作用
Output X1 X2 X3 X4
0      2  50 44 22
1      3  40 33 11
1      2  50 22 10
0      1  42 12 18

my_data$Output[my_data$Output == "NA"] <- NA

#Converting Variables to Factors
my_data$Output <- factor(my_data$Output)

#Only use complete observations -- eliminate NA's
clean_data <- my_data[complete.cases(my_data),]

#Converts all columns to factors
clean_data[,25:189] = data.frame(apply(clean_data[,25:189], 2, as.factor))

#Split into testing and training
set.seed(7)
Data_Splitting <- createDataPartition(clean_data$Output,p=2/3,list=FALSE)
training = clean_data[Data_Splitting,]
testing = clean_data[-Data_Splitting,]

#Random Forest training 
set.seed(7)
rf_train <- train(Output ~ ., data = training, method = "rf",
                  trControl = trainControl(method = "cv", number = 4, classProbs = T,
                                           summaryFunction = twoClassSummary),
                  metric = "ROC")

#Plot of variable importance 
varImp(rf_train)
plot(varImp(rf_train))
print(rf)

     Overall
8     100.00,
23     99.80,
21     98.19,
2      94.17,
634    92.06,
7      91.75,
1010   81.26,
636    69.02,
9      56.88,
630    49.90,
1      42.60,
4      36.95,
16     29.34,
15     29.10,
1008   28.83,
17     28.54,
18     27.50,
22     27.04,
3      26.78,
14     26.36,

0 个答案:

没有答案