Question

我使用R中的MLR包来拟合二进制问题的分类模型。对于每个模型，我都使用“ selectFeatures”功能对带有嵌入特征选择的交叉验证。在输出中，我检索测试集和预测上的平均AUC。为此，在获得一些建议（Get predictions on test sets in MLR）之后，我将“ makeFeatSelWrapper”功能与“ resample”功能结合使用。该目标似乎已经实现，但结果却很奇怪。使用逻辑回归作为分类器，我得到的AUC为0.5，这意味着未选择任何变量。由于使用链接的问题中提到的方法使用此分类器得到的AUC为0.9824432，因此此结果是意外的。使用神经网络作为分类器，我收到一条错误消息

sum（x）中的错误：参数的“类型”（列表）无效

怎么了？

这是示例代码：

# 1. Find a synthetic dataset for supervised learning (two classes)
###################################################################

install.packages("mlbench")
library(mlbench)
data(BreastCancer)

# generate 1000 rows, 21 quantitative candidate predictors and 1 target variable 
p<-mlbench.waveform(1000) 

# convert list into dataframe
dataset<-as.data.frame(p)

# drop thrid class to get 2 classes
dataset2  = subset(dataset, classes != 3)

# 2. Perform cross validation with embedded feature selection using logistic regression
#######################################################################################  

library(BBmisc)
library(nnet)
library(mlr)

# Choice of data 
mCT <- makeClassifTask(data =dataset2, target = "classes")

# Choice of algorithm i.e. neural network
mL <- makeLearner("classif.logreg", predict.type = "prob")

# Choice of cross-validations for folds 

outer = makeResampleDesc("CV", iters = 10,stratify = TRUE)

# Choice of feature selection method

ctrl = makeFeatSelControlSequential(method = "sffs", maxit = NA,alpha = 0.001)

# Choice of hold-out sampling between training and test within the fold

inner = makeResampleDesc("Holdout",stratify = TRUE)

lrn = makeFeatSelWrapper(mL, resampling = inner, control = ctrl)
r = resample(lrn, mCT, outer, extract = getFeatSelResult,measures = list(mlr::auc,mlr::acc,mlr::brier),models=TRUE)

# 3. Perform cross validation with embedded feature selection using neural network
##################################################################################

library(BBmisc)
library(nnet)
library(mlr)

# Choice of data 
mCT <- makeClassifTask(data =dataset2, target = "classes")

# Choice of algorithm i.e. neural network
mL <- makeLearner("classif.nnet", predict.type = "prob")

# Choice of cross-validations for folds 

outer = makeResampleDesc("CV", iters = 10,stratify = TRUE)

# Choice of feature selection method

ctrl = makeFeatSelControlSequential(method = "sffs", maxit = NA,alpha = 0.001)

# Choice of sampling between training and test within the fold

inner = makeResampleDesc("Holdout",stratify = TRUE)

lrn = makeFeatSelWrapper(mL, resampling = inner, control = ctrl)
r = resample(lrn, mCT, outer, extract = getFeatSelResult,measures = list(mlr::auc,mlr::acc,mlr::brier),models=TRUE)

Answer 1

如果您多次运行代码的逻辑回归部分，那么您还应该得到Error in sum(x) : invalid 'type' (list) of argument错误。但是，我发现奇怪的是，在重新采样之前修复特定种子（例如set.seed(1)）并不能确保错误不会出现。

内部mlr代码中出现错误，用于将功能选择的输出打印到控制台。一个非常简单的解决方法是避免在show.info = FALSE中使用makeFeatSelWrapper打印此类输出（请参见下面的代码）。尽管这可以消除错误，但造成错误的原因可能还有其他后果，尽管我有可能错误仅影响打印代码。

运行您的代码时，我得到的AUC仅高于0.90。请在下面找到用于逻辑回归的代码，并对其进行了稍许重新组织并采用了变通方法。我已经向dataset2添加了droplevels（）来从因子中删除缺失的3级，尽管这与解决方法无关。

library(mlbench)
library(mlr)
data(BreastCancer)

p<-mlbench.waveform(1000)
dataset<-as.data.frame(p)
dataset2  = subset(dataset, classes != 3)
dataset2  <- droplevels(dataset2  )    

mCT <- makeClassifTask(data =dataset2, target = "classes")
ctrl = makeFeatSelControlSequential(method = "sffs", maxit = NA,alpha = 0.001)
mL <- makeLearner("classif.logreg", predict.type = "prob")
inner = makeResampleDesc("Holdout",stratify = TRUE)
lrn = makeFeatSelWrapper(mL, resampling = inner, control = ctrl, show.info = FALSE)
# uncomment this for the error to appear again. Might need to run the code a couple of times to see the error
# lrn = makeFeatSelWrapper(mL, resampling = inner, control = ctrl)
outer = makeResampleDesc("CV", iters = 10,stratify = TRUE)
r = resample(lrn, mCT, outer, extract = getFeatSelResult,measures = list(mlr::auc,mlr::acc,mlr::brier),models=TRUE)

编辑：我报告了一个issue，并创建了一个带有修复程序的pull request。

如何在mlr中结合使用makeFeatSelWrapper和重采样功能

1 个答案: