我正在尝试使用rpart
更新在caret
中使用方法update
训练的模型,并尝试根据新数据进行预测。这是我制作模型的代码
# Loading the package
library(AppliedPredictiveModeling)
data(solubility)
library(caret)
library(rpart)
# Data Prep
MWTr <- subset(solTrainXtrans, select='MolWeight') # dim(MWTr) yields 951,1
MWTe <- subset(solTestXtrans, select='MolWeight') # dim(MWTe) yields 316,1
# Training
trCtrl <- trainControl(method='cv')
cpTune <- train(MWTr, solTrainY,
method='rpart',
trControl=ctrl)
当我预测将cpTune
模型用于新数据时,没有问题。以下代码不会产生任何问题
pred <- predict(cpTune, solTestXtrans) # length(pred) is 316
但是当我使用不同的cpTune
参数更新cp
模型时,会产生问题
# Update
cpTune1 <- update(cpTune, param=list(cp=0.5))
pred1 <- predict(cpTune, solTestXtrans) # length(pred) is 951
这是警告消息
Warning message:
'newdata' had 316 rows but variables found have 951 rows
我尝试使用公式,但效果不佳。这是我尝试的代码
MWTr <- data.frame(MolWeight=solTrainXtrans$Molweight,
Y=solTrainY)
MWTe <- data.frame(MolWeight=solTestXtrans$MolWeight)
cpTune <- train(Y~MolWeight, data=MWTr,
method='rpart',
trControl=trCtrl)
# predicting using cpTune works well
cpTune1 <- update(cpTune, param=list(cp=0.5))
predict(cpTune1, MWTe) # yields same problem as above
为什么更新后的模型生成的向量长度与更新前用于训练的数据帧的预测长度相同?
这是我的会话信息:
R version 3.4.4 (2018-03-15)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS 10.14.4
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] rpart_4.1-13 caret_6.0-81
[3] ggplot2_3.1.0 lattice_0.20-35
[5] AppliedPredictiveModeling_1.1-7