split.data = function(data, p = 0.7, s = 666){
set.seed(s)
index = sample(1:dim(data)[1])
train = data[index[1:floor(dim(data)[1] * p)], ]
test = data[index[((ceiling(dim(data)[1] * p)) + 1):dim(data)[1]], ]
return(list(train = train, test = test))
}
allset= split.data(train.data, p = 0.7)
trainset = allset$train
testset = allset$test
train.ctree = ctree(Survived ~ Pclass + Sex + Age + SibSp + Fare
+ Parch + Embarked, data=trainset)
ctree.predict = predict(train.ctree, testset)
confusionMatrix(ctree.predict, testset$Survived)
这是一个从泰坦尼克号数据集预测乘客生存的代码。在训练集中,级别数与测试测试不匹配。概率不会四舍五入,而是作为单独的级别存在。