我的Macbook上的scikit-learn
中出现了一些奇怪的随机性问题。 (OS X 10.12.6,conda
环境python 2.7
)。作为测试,我设置了以下脚本:
import numpy.random as npr
import numpy.testing as npt
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_breast_cancer
def test_randomness_one():
npr.seed(54)
rand_ints_one = npr.randint(500, size=50)
npr.seed(54)
rand_ints_two = npr.randint(500, size=50)
npt.assert_array_equal(rand_ints_one, rand_ints_two)
def test_logit_one():
data = load_breast_cancer()
preds_one = LogisticRegression(random_state=2)\
.fit(data['data'], data['target'])\
.decision_function(data['data'])
preds_two = LogisticRegression(random_state=2)\
.fit(data['data'], data['target'])\
.decision_function(data['data'])
npt.assert_array_equal(preds_one, preds_two)
def test_logit_two():
data = load_breast_cancer()
preds_one = LogisticRegression()\
.fit(data['data'], data['target'])\
.decision_function(data['data'])
preds_two = LogisticRegression()\
.fit(data['data'], data['target'])\
.decision_function(data['data'])
npt.assert_array_equal(preds_one, preds_two)
# A note: main used for testing with the interpreter directly
# Executed with pytest *without* the below lines.
if __name__ == "__main__":
test_randomness_one()
test_logit_one()
test_logit_two()
理论上,所有这些结果应该是相同的,运行Ubuntu和windows box的同事已经验证了这一点。在我的框中,如果在REPL中执行或通过python toy_test.py
运行,则所有这些测试都会通过。但是,如果通过pytest toy_test.py
运行,则test_logit_one
会一直失败,而test_logit_two
会经常失败,但并非总是如此。在这种情况下,随机性来自哪里?它是操作系统级别的吗? conda
- 水平? pytest
?或其他什么?