我想在R中尝试使用package caret的sbf函数,用方法" ranger"进行特征选择和分类。由于训练时间非常长,方法" rf"。
当我到sbf进行模型训练时,我总是遇到错误信息:
Error in { : task 1 failed - "undefined columns selected"
对于背景:我的原始数据集包含大约。 6200观察和约。具有二进制特征表示的15200特征,应该减少到大约。 1700功能。分类问题是二元的。
我制作了一个类似于原始数据集的可重现样本,它以相同的错误消息结束。我还添加了输出和会话信息。
有人可以帮我弄清楚这个问题是如何被规避的吗?
源代码
library(doSNOW)
library(caret)
library(entropy)
library(ranger)
# setup elements for sbf functions
igfit <- caretSBF
# score function
multiigScore <- function(x, y) {
uniigScore <- function (x, y) {
library(entropy)
# make x binary
xbinary <- as.numeric(x>0)
ybinary <- as.numeric(y==levels(y)[1])
# make a joint frequency table
disc <- discretize2d(xbinary, ybinary, 2, 2, r1=c(0,1), r2=c(0,1))
# calculate ig score
ig_score<-mi.empirical(disc)
as.numeric(ig_score)
}
apply(x, 2, uniigScore, y=y)
}
igfit$score <- multiigScore
# filter function
igfit$filter <- function (score, x, y) rank(score, ties.method = "first") <= 5
# data
x <- 0:1
y <- c("a", "b")
train_y <- as.factor(sample(y, 100, replace = T))
train_x <- data.frame(sample(x, 100, replace = T),
sample(x, 100, replace = T),
sample(x, 100, replace = T),
sample(x, 100, replace = T),
sample(x, 100, replace = T),
sample(x, 100, replace = T))
names(train_x) <-c("c", "d", "e", "f", "g", "h")
# control objects
custom_ctrl <- trainControl(method = "none")
sbf_ctrl <- sbfControl(functions = igfit,
method = "cv", number = 10,
multivariate = T, allowParallel = T,
saveDetails = T, returnResamp = "final", verbose = T)
sbf_fit <- sbf(train_x, train_y,
trControl = custom_ctrl,
sbfControl = sbf_ctrl,
method = "ranger",
tuneGrid = expand.grid(mtry=c(2)))
输出
Error in { : task 1 failed - "undefined columns selected"
会话信息
R version 3.2.5 (2016-04-14)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
locale:
[1] LC_COLLATE=German_Germany.1252
[2] LC_CTYPE=German_Germany.1252
[3] LC_MONETARY=German_Germany.1252
[4] LC_NUMERIC=C
[5] LC_TIME=German_Germany.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods
[7] base
other attached packages:
[1] randomForest_4.6-12 e1071_1.6-7 ranger_0.5.0
[4] entropy_1.2.1 caret_6.0-71 ggplot2_2.1.0
[7] lattice_0.20-33 doSNOW_1.0.14 snow_0.4-1
[10] iterators_1.0.8 foreach_1.4.3
loaded via a namespace (and not attached):
[1] Rcpp_0.12.7 magrittr_1.5 splines_3.2.5
[4] MASS_7.3-45 munsell_0.4.3 colorspace_1.2-6
[7] minqa_1.2.4 stringr_1.1.0 car_2.1-3
[10] plyr_1.8.4 tools_3.2.5 parallel_3.2.5
[13] nnet_7.3-12 pbkrtest_0.4-6 grid_3.2.5
[16] gtable_0.2.0 nlme_3.1-125 mgcv_1.8-12
[19] quantreg_5.29 class_7.3-14 MatrixModels_0.4-1
[22] lme4_1.1-12 Matrix_1.2-4 nloptr_1.0.4
[25] reshape2_1.4.1 codetools_0.2-14 stringi_1.1.1
[28] compiler_3.2.5 scales_0.4.0 stats4_3.2.5
[31] SparseM_1.72
答案 0 :(得分:1)
我想我自己找到了解决方案:
要使sbf与“游侠”合作,必须将custom_ctrl <- trainControl(method = "none")
更改为custom_ctrl <- trainControl(method = "none", classProbs = TRUE)
。 classProbs
的默认值为FALSE
,在使用“ranger”时会导致问题。