R插入符号:导致错误的原因"选择了未定义的列"使用sbf和方法"游侠"?

时间:2016-09-19 13:27:23

标签: r r-caret

我想在R中尝试使用package caret的sbf函数,用方法" ranger"进行特征选择和分类。由于训练时间非常长,方法" rf"。

当我到sbf进行模型训练时,我总是遇到错误信息:

Error in { : task 1 failed - "undefined columns selected" 

对于背景:我的原始数据集包含大约。 6200观察和约。具有二进制特征表示的15200特征,应该减少到大约。 1700功能。分类问题是二元的。

我制作了一个类似于原始数据集的可重现样本,它以相同的错误消息结束。我还添加了输出和会话信息。

有人可以帮我弄清楚这个问题是如何被规避的吗?

源代码

library(doSNOW)
library(caret)
library(entropy)
library(ranger)

# setup elements for sbf functions
igfit <- caretSBF

# score function
multiigScore <- function(x, y) {
  uniigScore <- function (x, y) {
    library(entropy)
    # make x binary
    xbinary <- as.numeric(x>0)
    ybinary <- as.numeric(y==levels(y)[1])
    # make a joint frequency table
    disc <- discretize2d(xbinary, ybinary, 2, 2, r1=c(0,1), r2=c(0,1))
    # calculate ig score
    ig_score<-mi.empirical(disc)
    as.numeric(ig_score)
  }
  apply(x, 2, uniigScore, y=y)
}

igfit$score <- multiigScore

# filter function
igfit$filter <- function (score, x, y) rank(score, ties.method = "first") <= 5

# data
x <- 0:1
y <- c("a", "b")
train_y <- as.factor(sample(y, 100, replace = T))
train_x <- data.frame(sample(x, 100, replace = T), 
                      sample(x, 100, replace = T), 
                      sample(x, 100, replace = T), 
                      sample(x, 100, replace = T), 
                      sample(x, 100, replace = T), 
                      sample(x, 100, replace = T))
names(train_x) <-c("c", "d", "e", "f", "g", "h")

# control objects
custom_ctrl <- trainControl(method = "none")
sbf_ctrl <- sbfControl(functions = igfit, 
                       method = "cv", number = 10, 
                       multivariate = T,  allowParallel = T, 
                       saveDetails = T, returnResamp = "final", verbose = T)

sbf_fit <- sbf(train_x, train_y, 
               trControl = custom_ctrl,
               sbfControl = sbf_ctrl,
               method = "ranger",
               tuneGrid = expand.grid(mtry=c(2)))

输出

Error in { : task 1 failed - "undefined columns selected"

会话信息

R version 3.2.5 (2016-04-14)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=German_Germany.1252 
[2] LC_CTYPE=German_Germany.1252   
[3] LC_MONETARY=German_Germany.1252
[4] LC_NUMERIC=C                   
[5] LC_TIME=German_Germany.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods  
[7] base     

other attached packages:
 [1] randomForest_4.6-12 e1071_1.6-7         ranger_0.5.0       
 [4] entropy_1.2.1       caret_6.0-71        ggplot2_2.1.0      
 [7] lattice_0.20-33     doSNOW_1.0.14       snow_0.4-1         
[10] iterators_1.0.8     foreach_1.4.3      

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.7        magrittr_1.5       splines_3.2.5     
 [4] MASS_7.3-45        munsell_0.4.3      colorspace_1.2-6  
 [7] minqa_1.2.4        stringr_1.1.0      car_2.1-3         
[10] plyr_1.8.4         tools_3.2.5        parallel_3.2.5    
[13] nnet_7.3-12        pbkrtest_0.4-6     grid_3.2.5        
[16] gtable_0.2.0       nlme_3.1-125       mgcv_1.8-12       
[19] quantreg_5.29      class_7.3-14       MatrixModels_0.4-1
[22] lme4_1.1-12        Matrix_1.2-4       nloptr_1.0.4      
[25] reshape2_1.4.1     codetools_0.2-14   stringi_1.1.1     
[28] compiler_3.2.5     scales_0.4.0       stats4_3.2.5      
[31] SparseM_1.72   

1 个答案:

答案 0 :(得分:1)

我想我自己找到了解决方案:

要使sbf与“游侠”合作,必须将custom_ctrl <- trainControl(method = "none")更改为custom_ctrl <- trainControl(method = "none", classProbs = TRUE)classProbs的默认值为FALSE,在使用“ranger”时会导致问题。