我遇到了R中具有变量重要性的问题。它打印重要性但不包括变量名。我无法弄清楚左列中索引的位置。下面是代码和输出。
我有以下形式的一组数据,除了我有192个变量和10,000个观察值。第2-24列是连续的,其余的是分类的。
更新:我运行相同的代码而不将分类变量更改为因子。调用varimp时,它现在打印相应的变量名称。当我将变量更改为分类
时,有谁知道为什么这不起作用Output X1 X2 X3 X4
0 2 50 44 22
1 3 40 33 11
1 2 50 22 10
0 1 42 12 18
my_data$Output[my_data$Output == "NA"] <- NA
#Converting Variables to Factors
my_data$Output <- factor(my_data$Output)
#Only use complete observations -- eliminate NA's
clean_data <- my_data[complete.cases(my_data),]
#Converts all columns to factors
clean_data[,25:189] = data.frame(apply(clean_data[,25:189], 2, as.factor))
#Split into testing and training
set.seed(7)
Data_Splitting <- createDataPartition(clean_data$Output,p=2/3,list=FALSE)
training = clean_data[Data_Splitting,]
testing = clean_data[-Data_Splitting,]
#Random Forest training
set.seed(7)
rf_train <- train(Output ~ ., data = training, method = "rf",
trControl = trainControl(method = "cv", number = 4, classProbs = T,
summaryFunction = twoClassSummary),
metric = "ROC")
#Plot of variable importance
varImp(rf_train)
plot(varImp(rf_train))
print(rf)
Overall
8 100.00,
23 99.80,
21 98.19,
2 94.17,
634 92.06,
7 91.75,
1010 81.26,
636 69.02,
9 56.88,
630 49.90,
1 42.60,
4 36.95,
16 29.34,
15 29.10,
1008 28.83,
17 28.54,
18 27.50,
22 27.04,
3 26.78,
14 26.36,