train()函数中缺少值错误树的Caret

时间:2017-04-27 10:45:46

标签: r decision-tree missing-data r-caret rpart

我使用R和我试图构建决策树。我已经使用party的包ctree和rpart的rpart

但是,因为我需要对我的模型进行交叉验证,所以我开始使用caret包,因为我可以通过使用函数`train()和我想要使用的方法来实现。

library(caret)
cvCtrl <- trainControl(method = "repeatedcv", repeats = 2,
                   classProbs = TRUE)

ctree.installed<- train(TARGET ~ OPENING_BALANCE+ MONTHS_SINCE_EXPEDITION+
                    RS_DESC+SAP_STATUS+ ACTIVATION_STATUS+ ROTUL_STATUS+ 
                    SIM_STATUS+ RATE_PLAN_SEGMENT_NORM,
                    data=trainSet,
                    method = "ctree",
                    trControl = cvCtrl)

但是,我的变量OPENING_BALANCEMONTHS_SINCE_EXPEDITION有一些缺失值,因此功能不起作用。我不明白为什么会发生这种情况,因为我正在尝试建造一棵树。当我使用其他软件包时,这个问题不会发生。

这是错误:

Error in na.fail.default(list(TARGET = c(3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,  : 
missing values in object

我不想使用na.action=pass,因为我真的不想放弃这些观察。

我做错了吗?为什么会这样?你对此有什么建议吗?

1 个答案:

答案 0 :(得分:5)

我开始考虑An unhandled exception of type 'System.NotSupportedException' occurred in EntityFramework.SqlServer.dll Additional information: LINQ to Entities does not recognize the method 'System.DateTime AddDays(Double)' method, and this method cannot be translated into a store expression. 包中有一些缺失值的数据集PimaIndiansDiabetes2

mlbench

data(PimaIndiansDiabetes2, package = "mlbench") head(PimaIndiansDiabetes2) pregnant glucose pressure triceps insulin mass pedigree age diabetes 1 6 148 72 35 NA 33.6 0.627 50 pos 2 1 85 66 29 NA 26.6 0.351 31 neg 3 8 183 64 NA NA 23.3 0.672 32 pos 4 1 89 66 23 94 28.1 0.167 21 neg 5 0 137 40 35 168 43.1 2.288 33 pos 6 5 116 74 NA NA 25.6 0.201 30 neg 中,我将train设置为na.action(导致数据集保持不变),然后在na.pass中设置maxsurrogate参数:

ctree

结果是:

library(caret)
cvCtrl <- trainControl(method="repeatedcv", repeats = 2, classProbs = TRUE)
set.seed(1234)
ctree1 <- train(diabetes ~ ., data=PimaIndiansDiabetes2,
                    method = "ctree",
                    na.action  = na.pass,
                    trControl = cvCtrl,
                    controls=ctree_control(maxsurrogate=2))