我正在使用插入符号train()函数来查找CART决策树的最佳cp值,该CART决策树采用自定义函数作为F1的度量。 train()函数返回一个我无法理解的错误。也许问题在于我定义的方式我提供了一个可重复的例子,我将非常感谢你的建议。
> library(data.table)
> library(ROSE)
> data(hacide)
> train <- hacide.train
> test <- hacide.test
> numFolds = trainControl(method = "cv" , number = 10)
> cpGrid = expand.grid(.cp = seq(0.01, 0.5, 0.01))
> f1 <- function(data, lev = NULL, model = NULL) {
+ f1_val <- F1_Score(y_pred = data$pred, y_true = data$obs, positive = lev[1])
+ c(F1 = f1_val)
+ }
> set.seed(12)
> train(cls ~ ., data = train,
+ method = "rpart",
+ tuneLength = 5,
+ metric = "F1",
+ trControl = trainControl(summaryFunction = f1,
+ classProbs = TRUE))
Error in train.default(x, y, weights = w, ...) :
At least one of the class levels is not a valid R variable name; This will cause errors when class probabilities are generated because the variables names will be converted to X0, X1 . Please use factor levels that can be used as valid R variable names (see ?make.names for help).
> levels(train$cls)
[1] "0" "1"
> class(train$cls)
[1] "factor"
答案 0 :(得分:0)
您可以尝试以下方法:
levels(train$cls) <- make.names(levels(train$cls))
然后运行模型,这将解决您的问题,不幸的是,您的示例无法重现,因为您错过了问题中的F1_Score函数定义。看看是否可行。
以下内容对我有用:
levels(train$cls) <- make.names(levels(train$cls))
set.seed(12)
train(cls ~ ., data = train,method = "rpart",tuneLength = 5,
metric = "ROC", trControl = trainControl(summaryFunction = twoClassSummary, classProbs = TRUE))