我正在使用来自thread的数据处理一些决策树,并且我遇到了几个错误。上周我在rpart中运行了树木,但是现在我正在使用插入符号来结合logloss和smote进行过度分类。以下是我的代码和相应的错误:
set.seed(1234)
ind <- sample(2, nrow(data), replace=TRUE, prob=c(0.8, 0.2))
train <- data[ind==1,]
test <- data[ind==2,]
########################
#Building a new DT with logloss and CV
########################
ctrl <- trainControl(method="cv", number=5, classProbs=TRUE,
summaryFunction=mnLogLoss)
ll_tree <- train(TripType~., data=train, method="rpart", metric="logLoss",
trControl=ctrl)
Error in ctrl$summaryFunction(testOutput, lev, method) :
'data' should have columns consistent with 'lev'
In addition: Warning message:
In train.default(x, y, weights = w, ...) :
cannnot compute class probabilities for regression
###################
#Using SMOTE
###################
ctrl2 <- trainControl(method="cv", number=5, classProbs=TRUE,
summaryFunction=mnLogLoss, sampling = "smote")
smote_tree <- train(TripType~., data=train, trControl=ctrl2, method="rpart")
Error: sampling methods are only implemented for classification problems
任何帮助都会受到赞赏,因为这是我第一次尝试这个。
由于