我正在尝试将rfe
包中的caret
函数与PLS-DA模型结合使用。
sessionInfo()
R version 3.1.1 (2014-07-10)
Platform: x86_64-apple-darwin10.8.0 (64-bit)
locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
attached base packages:
[1] splines grid parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] mclust_4.4 Kendall_2.2 doBy_4.5-13 survival_2.37-7 statmod_1.4.20
[6] preprocessCore_1.26.1 sva_3.10.0 mgcv_1.8-4 nlme_3.1-119 corpcor_1.6.7
[11] car_2.0-22 reshape2_1.4.1 gplots_2.16.0 DMwR_0.4.1 mi_0.09-19
[16] arm_1.7-07 lme4_1.1-7 Matrix_1.1-5 MASS_7.3-37 randomForest_4.6-10
[21] plyr_1.8.1 pls_2.4-3 caret_6.0-41 ggplot2_1.0.0 lattice_0.20-29
[26] pcaMethods_1.54.0 Rcpp_0.11.4 Biobase_2.24.0 BiocGenerics_0.10.0
loaded via a namespace (and not attached):
[1] abind_1.4-0 bitops_1.0-6 boot_1.3-14 BradleyTerry2_1.0-5 brglm_0.5-9 caTools_1.17.1
[7] class_7.3-11 coda_0.16-1 codetools_0.2-10 colorspace_1.2-4 compiler_3.1.1 digest_0.6.8
[13] e1071_1.6-4 foreach_1.4.2 foreign_0.8-62 gdata_2.13.3 gtable_0.1.2 gtools_3.4.1
[19] iterators_1.0.7 KernSmooth_2.23-13 minqa_1.2.4 munsell_0.4.2 nloptr_1.0.4 nnet_7.3-8
[25] proto_0.3-10 quantmod_0.4-3 R2WinBUGS_2.1-19 ROCR_1.0-5 rpart_4.1-8 scales_0.2.4
[31] stringr_0.6.2 tools_3.1.1 TTR_0.22-0 xts_0.9-7 zoo_1.7-11
练习我使用虹膜数据运行以下示例。
data(iris)
subsets <- 2:4
ctrl <- rfeControl(functions = caretFuncs, method = 'cv', number = 5, verbose=TRUE)
trctrl <- trainControl(method='cv', number=5)
mod <- rfe(Species ~., data = iris, sizes = subsets, rfeControl = ctrl, trControl = trctrl, method = 'pls')
一切运作良好。
mod
Recursive feature selection
Outer resampling method: Cross-Validated (5 fold)
Resampling performance over subset size:
Variables Accuracy Kappa AccuracySD KappaSD Selected
2 0.6533 0.48 0.02981 0.04472
3 0.8067 0.71 0.06412 0.09618 *
4 0.7867 0.68 0.07674 0.11511
The top 3 variables (out of 3):
Sepal.Width, Petal.Length, Sepal.Length
但是,如果我尝试在我生成的数据上复制它,我会收到以下错误。我无法理解为什么!如果您有任何想法,我真的很想听听它们。
x <- as.data.frame(matrix(0,10,10))
for(i in 1:9) {x[,i] <- rnorm(10,0,1)}
x[,10] <- as.factor(rbinom(10, 1, 0.5))
subsets <- 2:9
ctrl <- rfeControl(functions = caretFuncs, method = 'cv', number = 5, verbose=TRUE)
trctrl <- trainControl(method='cv', number=5)
mod <- rfe(V10 ~., data = x, sizes = subsets, rfeControl = ctrl, trControl = trctrl, method = 'pls')
Error in { : task 1 failed - "undefined columns selected"
In addition: Warning messages:
1: In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
There were missing values in resampled performance measures.
2: In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
There were missing values in resampled performance measures.
3: In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
There were missing values in resampled performance measures.
4: In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
There were missing values in resampled performance measures.
5: In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
There were missing values in resampled performance measures.
答案 0 :(得分:1)
我已经解决了(在很多前后),响应因子变量的级别必须是PLI-DA与插入符号中的RFE相结合的字符。
例如......
x <- data.frame(matrix(rnorm(1000),100,10))
y <- as.factor(c(rep('Positive',40), rep('Negative',60)))
data <- data.frame(x,y)
subsets <- 2:9
ctrl <- rfeControl(functions = caretFuncs, method = 'cv', number = 5, verbose=TRUE)
trctrl <- trainControl(method='cv', number=5)
mod <- rfe(y ~., data, sizes = subsets, rfeControl = ctrl, trControl = trctrl, method = 'pls')