插入功能'火车'失败的袋装svm

时间:2015-08-06 00:02:51

标签: r svm r-caret bioconductor kernlab

我在Ubuntu上使用roc 3.1.2的bioconductor包MLSeq。我试过通过the example provided by the package,这工作得很好。但是,我想对bagsvm函数使用classify方法,因此在chunk 14,我更改了代码

svm <- classify(data = data.trainS4, method = "svm", normalize = "deseq",
               deseqTransform = "vst", cv = 5, rpt = 3, ref = "T") 

 bagsvm <- classify(data = data.trainS4, method = "bagsvm", normalize = "deseq",
               deseqTransform = "vst", cv = 5, rpt = 3, ref = "T")

产生错误:

Something is wrong; all the Accuracy metric values are missing:
    Accuracy       Kappa   
 Min.   : NA   Min.   : NA 
 1st Qu.: NA   1st Qu.: NA 
 Median : NA   Median : NA 
 Mean   :NaN   Mean   :NaN 
 3rd Qu.: NA   3rd Qu.: NA 
 Max.   : NA   Max.   : NA 
 NA's   :1     NA's   :1   
Error in train.default(counts, conditions, method = "bag", B = B, bagControl = bagControl(fit = svmBag$fit,  :
  Stopping
In addition: There were 17 warnings (use warnings() to see them)

警告是:

 Warning messages:
1: executing %dopar% sequentially: no parallel backend registered
2: In eval(expr, envir, enclos) :
  model fit failed for Fold1.Rep1: vars=150 Error in fitter(btSamples[[iter]], x = x, y = y, ctrl = bagControl, v = vars,  :
  task 1 failed - "could not find function "lev""

警告2然后重复14次,然后是:

17: In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo,  ... :
  There were missing values in resampled performance measures.

traceback()制作了

4: stop("Stopping")
3: train.default(counts, conditions, method = "bag", B = B, bagControl = bagControl(fit = svmBag$fit, 
       predict = svmBag$pred, aggregate = svmBag$aggregate), trControl = ctrl, 
       ...)
2: train(counts, conditions, method = "bag", B = B, bagControl = bagControl(fit = svmBag$fit, 
       predict = svmBag$pred, aggregate = svmBag$aggregate), trControl = ctrl, 
       ...)
1: classify(data = data.trainS4, method = "bagsvm", normalize = "deseq", 
       deseqTransform = "vst", cv = 5, rpt = 3, ref = "T")

我认为问题可能是我认为MLSeq代码使用的kernlab库没有加载,所以我尝试了

library(kernlab)
bagsvm <- classify(data = data.trainS4, method = "bagsvm", normalize = "deseq",
               deseqTransform = "vst", cv = 5, rpt = 3, ref = "T")

导致相同的错误,但警告更改为:

Warning messages:
    1: In eval(expr, envir, enclos) :
      model fit failed for Fold1.Rep1: vars=150 Error in fitter(btSamples[[iter]], x = x, y = y, ctrl = bagControl, v = vars,  :
      task 1 failed - "no applicable method for 'predict' applied to an object of class "c('ksvm', 'vm')""

重复15次,然后

16: In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo,  ... :
  There were missing values in resampled performance measures.

我不相信此问题特定于MLSeq,因为我尝试将train函数作为

运行
ctrl <- trainControl(method = "repeatedcv", number = 5, 
    repeats = 3)
train <- train(counts, conditions, method = "bag", B = 100, 
           bagControl = bagControl(fit = svmBag$fit, predict = svmBag$pred, 
                                   aggregate = svmBag$aggregate), trControl = ctrl)

其中counts是带有RNASeq数据的数据框,conditions是类的一个因子,我得到了完全相同的结果。非常感谢任何帮助。

2 个答案:

答案 0 :(得分:1)

我承认我没有尝试重现你的所有步骤。但是,你所做的一切都是从一个支持SVM&#34;的工作,到一个包装的SVM集合&#34;。我不确定你是否知道entirely what that means,但简而言之:

不是仅使用所有(训练)数据制作1个模型,而是:

  • 制作多个模型
  • 其中每个模型使用随机选择的培训数据的子集("bagging"
  • 并通过查看其在培训数据的未使用部分的效果来验证每个模型的质量。

因为情况确实如此,因为这是你所做的唯一改变,我怀疑:

  • 您的数据太少,或者空格或NA的条目太多,以至于无法完成装袋中的任何迷你SVM模型。

看起来像mini-SVM models are broken into sets of 100 samples, by default。 (请参阅classify中的B = 100默认选项。)例如,如果这些子模型中只有100个具有完全空白/ NA功能的可能性,那么套袋模型将失败。

如何解决?

  • 首先,我会尝试将B值提升到更大的值,例如1000.出于类似的原因,我会检查任何功能中的缺失值#像table(is.na(feature_oi))

  • 之类的东西
  • 接下来,如果模型适用于上面的任何修复,我会看看你是否可以通过以下方式修复数据:(a)查看是否可以某种方式恢复缺失的值,或者(b) )看看一些缺失值的观察结果是否质量如此之低,以至于你可能要考虑完全删除观察结果。

  • 当然,如果模型与这些修复程序一起使用,另一种解决方案仅仅是将它与这些修复程序一起使用。使B 1000或大的东西。请记住,如果这是你试图在生产中运行的东西,你仍然会制造一些可能会崩溃的东西。

  • 最后,如果原始修复程序没有使模型工作,那么我不确定问题。可能是bagsvm本身的实现中存在错误。希望更熟悉图书馆的人可以在这方面提出更多建议。

答案 1 :(得分:1)

我试图调试我的问题,似乎无意中找到了解决方案。由于问题似乎出现在预测函数中,因此我将svmBag$pred函数存储为变量predfunct,以便我可以看到它无法正常工作

predfunct<-function (object, x)
{
 if (is.character(lev(object))) {
    out <- predict(object, as.matrix(x), type = "probabilities")
    colnames(out) <- lev(object)
    rownames(out) <- NULL
  }
  else out <- predict(object, as.matrix(x))[, 1]
  out
}

然后调用

train <- train(counts, conditions, method = "bag", B = 100, 
       bagControl = bagControl(fit = svmBag$fit, predict = predfunct, 
                               aggregate = svmBag$aggregate), trControl = ctrl)

与问题说明的最后一个代码块一样,predfunct替换svmBag$pred。不知何故,这解决了问题,一切运行得很好。如果有人能弄清楚为什么会这样,并且最好找到一个不是那么大的解决方案,我会回答你的答案。