当我在随机目录林上运行varImp时,它在变量名后添加了.Q,.L,.C和^ 4之类的后缀。有谁知道这些指的是什么,或者我做错了什么?
我在包含有序变量和分类变量的数据集上使用了插入符号包。
student <- read.csv("http://cdn-files.soa.org/web/student-success-data-file.csv")
str(student)
# Making certain fields ordered factors
ordered.cat.vars <- c("Medu", "Fedu", "traveltime", "studytime", "famrel", "freetime", "goout", "Dalc", "Walc", "health")
student[,ordered.cat.vars] <- lapply(student[,ordered.cat.vars], factor)
student[,ordered.cat.vars] <- lapply(student[,ordered.cat.vars], ordered)
# Removing certain fields and creating target variable, G3.passflag
library(dplyr)
student <- student %>% select(-one_of(c("absences","G1","G2"))) %>% mutate(G3.passflag = ifelse(G3 >= 10,"pass","fail")) %>% select(-one_of("G3"))
# Running random forest
library(caret)
trctrl <- trainControl(method = "cv", number = 5)
grid <- expand.grid(mtry = seq(1,15,1))
rf_1 <- train(
form = G3.passflag ~ .,
data = student,
method = "rf",
metric = "Accuracy",
trControl = trctrl,
tuneGrid = grid,
importance = TRUE
)
varImp(rf_1)
对于varImp,我得到以下结果
only 20 most important variables shown (out of 66)
Importance
failures 100.00
goout.L 74.31
Medu.L 71.76
Fedu.Q 71.04
Medu.Q 69.52
famsupyes 69.04
goout^4 66.85
Fedu.C 63.28
Fedu.L 61.21
Fedu^4 60.66
Medu.C 60.53
Walc^4 60.25
Walc.C 60.23
Walc.L 58.16
freetime.C 57.65
age 57.39
goout.C 56.50
health.Q 55.75
freetime.Q 55.36
Mjobother 54.63
感谢您提供的任何帮助!
谢谢, 亚历克斯