决策树C50预测模型中的“已选择未定义的列”

时间:2018-11-18 22:54:37

标签: r prediction decision-tree

我要花几个小时来解决有关因子的问题。我正在使用C50库对数据框进行决策树预测,但是它为model.DT提供了“未定义的列选择错误”。 目的是根据其他变量预测学生的学习成绩。

The dataframe looks like 
ID term event_count checkin_count emergency_flag probation
111  1     3           4              0              0
112  2     2           2              1              1
113  1     0           6              1              0   

data$probation_status <- ifelse(data$PROBATION == 0, "good academic 
standing","on probation")
data$TERM <- as.factor(data$TERM)
data$EVENT_COUNT <- as.factor(data$EVENT_COUNT)
data$CHECKIN_COUNT <- as.factor(data$CHECKIN_COUNT)
data$EMERGENCYFLAG <- as.factor(data$EMERGENCYFLAG)

library(C50)
#create sample size and split into traning and testing data
sample_size <-floor(0.8*nrow(data))
training_index <-sample(seq_len(nrow(data)), size =sample_size)
train <- data[training_index,]
test <- data[-training_index,]

train$probation_status <- as.factor(train$probation_status)
str(train$probation_status)
predictors <- c('term','event_count','checkin_count','emergency_flag') 

# Error occurs when executing the following line 
# Error in `[.data.frame`(train, , predictors) : undefined columns 
  selected
model.DT <-C5.0.default(x =train[,predictors], 
y=train$probation_status)

任何帮助将不胜感激。 谢谢。

0 个答案:

没有答案