Question

我正在使用来自thread的数据处理一些决策树，并且我遇到了几个错误。上周我在rpart中运行了树木，但是现在我正在使用插入符号来结合logloss和smote进行过度分类。以下是我的代码和相应的错误：

set.seed(1234)
ind <- sample(2, nrow(data), replace=TRUE, prob=c(0.8, 0.2))
train <- data[ind==1,]
test <- data[ind==2,]
########################
#Building a new DT with logloss and CV
########################

ctrl <- trainControl(method="cv", number=5, classProbs=TRUE, 
summaryFunction=mnLogLoss)

ll_tree <- train(TripType~., data=train, method="rpart",  metric="logLoss", 
trControl=ctrl)

Error in ctrl$summaryFunction(testOutput, lev, method) : 
  'data' should have columns consistent with 'lev'
In addition: Warning message:
In train.default(x, y, weights = w, ...) :
  cannnot compute class probabilities for regression


###################
#Using SMOTE
###################
ctrl2 <- trainControl(method="cv", number=5, classProbs=TRUE, 
summaryFunction=mnLogLoss, sampling = "smote")
smote_tree <- train(TripType~., data=train, trControl=ctrl2, method="rpart")

Error: sampling methods are only implemented for classification problems

任何帮助都会受到赞赏，因为这是我第一次尝试这个。

由于

Logloss和SMOTE的决策树

0 个答案: