我正在使用“插入符号”包中的德语信用数据集。
首先,我构建了一个非常简单的模型:
library(caret)
library(randomForest)
library(pmml)
data(GermanCredit)
GermanCredit <- GermanCredit[, -nearZeroVar(GermanCredit)]
GermanCredit$CheckingAccountStatus.lt.0 <- NULL
GermanCredit$SavingsAccountBonds.lt.100 <- NULL
GermanCredit$EmploymentDuration.lt.1 <- NULL
GermanCredit$EmploymentDuration.Unemployed <- NULL
GermanCredit$Personal.Male.Married.Widowed <- NULL
GermanCredit$Property.Unknown <- NULL
GermanCredit$Housing.ForFree <- NULL
set.seed(100)
inTrain <- createDataPartition(GermanCredit$Class, p = .8)[[1]]
GermanCreditTrain <- GermanCredit[ inTrain, ]
GermanCreditTest <- GermanCredit[-inTrain, ]
set.seed(1056)
credit.rf <- randomForest(Class~., data = GermanCreditTrain, ntree = 500)
现在,如果我在测试集上预测结果类,并多次这样做,然后比较结果:
credit.pred1 <- predict(credit.rf, GermanCreditTest)
credit.pred2 <- predict(credit.rf, GermanCreditTest)
credit.pred3 <- predict(credit.rf, GermanCreditTest)
all.equal(credit.pred1, credit.pred2)
all.equal(credit.pred2, credit.pred3)
all.equal(credit.pred1, credit.pred3)
我对所有3次传球都有相同的预测。现在,我是通过在RStudio解释器中手动输入代码来实现的。但是,如果我从我的文本编辑器(我在这里发布的https://gist.github.com/anonymous/32b3c8194362d2e10527)中复制粘贴代码,我会收到一条错误消息,指出在第二次和第三次比较中存在3个字符串差异!
这怎么可能?
答案 0 :(得分:0)
尝试使用插入符号的列车功能:
credit.rf <- train(Class~., data = GermanCreditTrain, method="rf")
而不是
credit.rf <- randomForest(Class~., data = GermanCreditTrain, ntree = 500)
我能够重现这个问题并且不确定是什么导致了它。但是,以上似乎在粘贴时起作用:
credit.rf <- train(Class~., data = GermanCreditTrain, method="rf")
>
> credit.pred1 <- predict(credit.rf, GermanCreditTest)
> credit.pred2 <- predict(credit.rf, GermanCreditTest)
> credit.pred3 <- predict(credit.rf, GermanCreditTest)
>
> all.equal(credit.pred1, credit.pred2)
[1] TRUE
> all.equal(credit.pred2, credit.pred3)
[1] TRUE
> all.equal(credit.pred1, credit.pred3)
[1] TRUE