获取消息 - 对于泰坦尼克数据集,数据的级别不能超过参考

时间:2015-10-25 18:00:27

标签: r

split.data = function(data, p = 0.7, s = 666){   
    set.seed(s)   
    index = sample(1:dim(data)[1])  
    train = data[index[1:floor(dim(data)[1] * p)], ]  
    test = data[index[((ceiling(dim(data)[1] * p)) + 1):dim(data)[1]], ]  
    return(list(train = train, test = test))
}

allset= split.data(train.data, p = 0.7)  
trainset = allset$train  
testset = allset$test

train.ctree = ctree(Survived ~ Pclass + Sex + Age + SibSp + Fare
                + Parch + Embarked, data=trainset)  
ctree.predict = predict(train.ctree, testset)
confusionMatrix(ctree.predict, testset$Survived)  

这是一个从泰坦尼克号数据集预测乘客生存的代码。在训练集中,级别数与测试测试不匹配。概率不会四舍五入,而是作为单独的级别存在。

0 个答案:

没有答案