我在Ubuntu上使用roc 3.1.2的bioconductor包MLSeq
。我试过通过the example provided by the package,这工作得很好。但是,我想对bagsvm
函数使用classify
方法,因此在chunk 14
,我更改了代码
svm <- classify(data = data.trainS4, method = "svm", normalize = "deseq",
deseqTransform = "vst", cv = 5, rpt = 3, ref = "T")
到
bagsvm <- classify(data = data.trainS4, method = "bagsvm", normalize = "deseq",
deseqTransform = "vst", cv = 5, rpt = 3, ref = "T")
产生错误:
Something is wrong; all the Accuracy metric values are missing:
Accuracy Kappa
Min. : NA Min. : NA
1st Qu.: NA 1st Qu.: NA
Median : NA Median : NA
Mean :NaN Mean :NaN
3rd Qu.: NA 3rd Qu.: NA
Max. : NA Max. : NA
NA's :1 NA's :1
Error in train.default(counts, conditions, method = "bag", B = B, bagControl = bagControl(fit = svmBag$fit, :
Stopping
In addition: There were 17 warnings (use warnings() to see them)
警告是:
Warning messages:
1: executing %dopar% sequentially: no parallel backend registered
2: In eval(expr, envir, enclos) :
model fit failed for Fold1.Rep1: vars=150 Error in fitter(btSamples[[iter]], x = x, y = y, ctrl = bagControl, v = vars, :
task 1 failed - "could not find function "lev""
警告2然后重复14次,然后是:
17: In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, ... :
There were missing values in resampled performance measures.
traceback()
制作了
4: stop("Stopping") 3: train.default(counts, conditions, method = "bag", B = B, bagControl = bagControl(fit = svmBag$fit, predict = svmBag$pred, aggregate = svmBag$aggregate), trControl = ctrl, ...) 2: train(counts, conditions, method = "bag", B = B, bagControl = bagControl(fit = svmBag$fit, predict = svmBag$pred, aggregate = svmBag$aggregate), trControl = ctrl, ...) 1: classify(data = data.trainS4, method = "bagsvm", normalize = "deseq", deseqTransform = "vst", cv = 5, rpt = 3, ref = "T")
我认为问题可能是我认为MLSeq代码使用的kernlab
库没有加载,所以我尝试了
library(kernlab)
bagsvm <- classify(data = data.trainS4, method = "bagsvm", normalize = "deseq",
deseqTransform = "vst", cv = 5, rpt = 3, ref = "T")
导致相同的错误,但警告更改为:
Warning messages: 1: In eval(expr, envir, enclos) : model fit failed for Fold1.Rep1: vars=150 Error in fitter(btSamples[[iter]], x = x, y = y, ctrl = bagControl, v = vars, : task 1 failed - "no applicable method for 'predict' applied to an object of class "c('ksvm', 'vm')""
重复15次,然后
16: In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, ... :
There were missing values in resampled performance measures.
我不相信此问题特定于MLSeq
,因为我尝试将train
函数作为
ctrl <- trainControl(method = "repeatedcv", number = 5,
repeats = 3)
train <- train(counts, conditions, method = "bag", B = 100,
bagControl = bagControl(fit = svmBag$fit, predict = svmBag$pred,
aggregate = svmBag$aggregate), trControl = ctrl)
其中counts
是带有RNASeq数据的数据框,conditions
是类的一个因子,我得到了完全相同的结果。非常感谢任何帮助。
答案 0 :(得分:1)
我承认我没有尝试重现你的所有步骤。但是,你所做的一切都是从一个支持SVM&#34;的工作,到一个包装的SVM集合&#34;。我不确定你是否知道entirely what that means,但简而言之:
不是仅使用所有(训练)数据制作1个模型,而是:
因为情况确实如此,因为这是你所做的唯一改变,我怀疑:
NA
的条目太多,以至于无法完成装袋中的任何迷你SVM模型。看起来像mini-SVM models are broken into sets of 100 samples, by default。 (请参阅classify中的B = 100
默认选项。)例如,如果这些子模型中只有100个具有完全空白/ NA
功能的可能性,那么套袋模型将失败。
如何解决?
首先,我会尝试将B
值提升到更大的值,例如1000.出于类似的原因,我会检查任何功能中的缺失值#像table(is.na(feature_oi))
接下来,如果模型适用于上面的任何修复,我会看看你是否可以通过以下方式修复数据:(a)查看是否可以某种方式恢复缺失的值,或者(b) )看看一些缺失值的观察结果是否质量如此之低,以至于你可能要考虑完全删除观察结果。
当然,如果模型与这些修复程序一起使用,另一种解决方案仅仅是将它与这些修复程序一起使用。使B
1000或大的东西。请记住,如果这是你试图在生产中运行的东西,你仍然会制造一些可能会崩溃的东西。
最后,如果原始修复程序没有使模型工作,那么我不确定问题。可能是bagsvm
本身的实现中存在错误。希望更熟悉图书馆的人可以在这方面提出更多建议。
答案 1 :(得分:1)
我试图调试我的问题,似乎无意中找到了解决方案。由于问题似乎出现在预测函数中,因此我将svmBag$pred
函数存储为变量predfunct
,以便我可以看到它无法正常工作
predfunct<-function (object, x)
{
if (is.character(lev(object))) {
out <- predict(object, as.matrix(x), type = "probabilities")
colnames(out) <- lev(object)
rownames(out) <- NULL
}
else out <- predict(object, as.matrix(x))[, 1]
out
}
然后调用
train <- train(counts, conditions, method = "bag", B = 100,
bagControl = bagControl(fit = svmBag$fit, predict = predfunct,
aggregate = svmBag$aggregate), trControl = ctrl)
与问题说明的最后一个代码块一样,predfunct
替换svmBag$pred
。不知何故,这解决了问题,一切运行得很好。如果有人能弄清楚为什么会这样,并且最好找到一个不是那么大的解决方案,我会回答你的答案。