粘贴来自不同来源的代码时的不同RF预测

时间:2014-10-20 11:20:14

标签: r machine-learning random-forest

我正在使用“插入符号”包中的德语信用数据集。

首先,我构建了一个非常简单的模型:

library(caret)
library(randomForest)
library(pmml)
data(GermanCredit)

GermanCredit <- GermanCredit[, -nearZeroVar(GermanCredit)]
GermanCredit$CheckingAccountStatus.lt.0 <- NULL
GermanCredit$SavingsAccountBonds.lt.100 <- NULL
GermanCredit$EmploymentDuration.lt.1 <- NULL
GermanCredit$EmploymentDuration.Unemployed <- NULL
GermanCredit$Personal.Male.Married.Widowed <- NULL
GermanCredit$Property.Unknown <- NULL
GermanCredit$Housing.ForFree <- NULL

set.seed(100)
inTrain <- createDataPartition(GermanCredit$Class, p = .8)[[1]]
GermanCreditTrain <- GermanCredit[ inTrain, ]
GermanCreditTest  <- GermanCredit[-inTrain, ]

set.seed(1056)
credit.rf <- randomForest(Class~., data = GermanCreditTrain, ntree = 500)

现在,如果我在测试集上预测结果类,并多次这样做,然后比较结果:

credit.pred1 <- predict(credit.rf, GermanCreditTest)
credit.pred2 <- predict(credit.rf, GermanCreditTest)
credit.pred3 <- predict(credit.rf, GermanCreditTest)

all.equal(credit.pred1, credit.pred2)
all.equal(credit.pred2, credit.pred3)
all.equal(credit.pred1, credit.pred3) 

我对所有3次传球都有相同的预测。现在,我是通过在RStudio解释器中手动输入代码来实现的。但是,如果我从我的文本编辑器(我在这里发布的https://gist.github.com/anonymous/32b3c8194362d2e10527)中复制粘贴代码,我会收到一条错误消息,指出在第二次和第三次比较中存在3个字符串差异!

这怎么可能?

1 个答案:

答案 0 :(得分:0)

尝试使用插入符号的列车功能:

credit.rf <- train(Class~., data = GermanCreditTrain, method="rf")

而不是

credit.rf <- randomForest(Class~., data = GermanCreditTrain, ntree = 500)

我能够重现这个问题并且不确定是什么导致了它。但是,以上似乎在粘贴时起作用:

credit.rf <- train(Class~., data = GermanCreditTrain, method="rf")
> 
> credit.pred1 <- predict(credit.rf, GermanCreditTest)
> credit.pred2 <- predict(credit.rf, GermanCreditTest)
> credit.pred3 <- predict(credit.rf, GermanCreditTest)
> 
> all.equal(credit.pred1, credit.pred2)
[1] TRUE
> all.equal(credit.pred2, credit.pred3)
[1] TRUE
> all.equal(credit.pred1, credit.pred3)
[1] TRUE