我要花几个小时来解决有关因子的问题。我正在使用C50库对数据框进行决策树预测,但是它为model.DT提供了“未定义的列选择错误”。 目的是根据其他变量预测学生的学习成绩。
The dataframe looks like
ID term event_count checkin_count emergency_flag probation
111 1 3 4 0 0
112 2 2 2 1 1
113 1 0 6 1 0
data$probation_status <- ifelse(data$PROBATION == 0, "good academic
standing","on probation")
data$TERM <- as.factor(data$TERM)
data$EVENT_COUNT <- as.factor(data$EVENT_COUNT)
data$CHECKIN_COUNT <- as.factor(data$CHECKIN_COUNT)
data$EMERGENCYFLAG <- as.factor(data$EMERGENCYFLAG)
library(C50)
#create sample size and split into traning and testing data
sample_size <-floor(0.8*nrow(data))
training_index <-sample(seq_len(nrow(data)), size =sample_size)
train <- data[training_index,]
test <- data[-training_index,]
train$probation_status <- as.factor(train$probation_status)
str(train$probation_status)
predictors <- c('term','event_count','checkin_count','emergency_flag')
# Error occurs when executing the following line
# Error in `[.data.frame`(train, , predictors) : undefined columns
selected
model.DT <-C5.0.default(x =train[,predictors],
y=train$probation_status)
任何帮助将不胜感激。 谢谢。