Question

我一直在使用sklearn的随机森林，我试图比较几种模型。然后我注意到随机林甚至使用相同的种子给出了不同的结果。我尝试了两种方式：random.seed（1234）以及使用随机森林内置的random_state = 1234 在这两种情况下，我得到不可重复的结果。我错过了什么......？

# 1
random.seed(1234)
RandomForestClassifier(max_depth=5, max_features=5, criterion='gini', min_samples_leaf = 10)
# or 2
RandomForestClassifier(max_depth=5, max_features=5, criterion='gini', min_samples_leaf = 10, random_state=1234)

有什么想法吗？谢谢！

编辑：添加更完整的代码版本

clf = RandomForestClassifier(max_depth=60, max_features=60, \
                        criterion='entropy', \
                        min_samples_leaf = 3, random_state=seed)
# As describe, I tried random_state in several ways, still diff results
clf = clf.fit(X_train, y_train)

predicted = clf.predict(X_test)
predicted_prob = clf.predict_proba(X_test)[:, 1]
fpr, tpr, thresholds = metrics.roc_curve(np.array(y_test), predicted_prob)
auc = metrics.auc(fpr,tpr)
print (auc)

Answer 1

首先确保您拥有所需模块的最新版本（例如scipy，numpy等）。时，您键入random.seed(1234)，即可使用numpy生成器。

在random_state中使用RandomForestClassifier参数时，有以下几个选项： int ， RandomState实例或无

来自文档here：

如果是int，则random_state是随机数生成器使用的种子;

如果是RandomState实例，则random_state是随机数生成器;

如果为None，则随机数生成器是np.random使用的RandomState实例。

在这两种情况下使用相同生成器的方法如下。我在两种情况下使用相同的（numpy）生成器，我得到可重现的结果（相同的结果）在这两种情况下）。

from sklearn.ensemble import RandomForestClassifier from sklearn.datasets import make_classification from numpy import * X, y = make_classification(n_samples=1000, n_features=4, n_informative=2, n_redundant=0, random_state=0, shuffle=False) random.seed(1234) clf = RandomForestClassifier(max_depth=2) clf.fit(X, y) clf2 = RandomForestClassifier(max_depth=2, random_state = random.seed(1234)) clf2.fit(X, y)

检查结果是否相同：

all(clf.predict(X) == clf2.predict(X)) #True

运行相同代码5次后检查：

from sklearn.ensemble import RandomForestClassifier from sklearn.datasets import make_classification from numpy import * for i in range(5): X, y = make_classification(n_samples=1000, n_features=4, n_informative=2, n_redundant=0, random_state=0, shuffle=False) random.seed(1234) clf = RandomForestClassifier(max_depth=2) clf.fit(X, y) clf2 = RandomForestClassifier(max_depth=2, random_state = random.seed(1234)) clf2.fit(X, y) print(all(clf.predict(X) == clf2.predict(X)))

<强>结果：

True True True True True

Answer 2

好的，最终解决它的是重新安装conda环境。我仍然不确定为什么会出现不同的结果。感谢

Python sklearn RandomForestClassifier不可重现的结果

2 个答案:

运行相同代码5次后检查：