Question

我试图弄清楚为什么使用定义的索引集，相同的输入数据和random_state中的sklearn的某些交叉验证会使用相同的LogisticRegression给出不同的结果{ {1}}模型超参数。我的第一个想法是，在后续运行中，初始random_state可能会有所不同。然后我意识到当我pickle random_state它说对象是不同的时我直接比较2个对象但get_state方法中的值是相同的。为什么是这样？

random_state = np.random.RandomState(0)
print(random_state)
# <mtrand.RandomState object at 0x12424e480>

with open("./rs.pkl", "wb") as f:
    pickle.dump(random_state, f, protocol=pickle.HIGHEST_PROTOCOL)
with open("./rs.pkl", "rb") as f:
    random_state_copy = pickle.load(f)
    print(random_state_copy)
# <mtrand.RandomState object at 0x126465240>
print(random_state == random_state_copy)
# False
print(str(random_state.get_state()) == str(random_state_copy.get_state()))
# True

版本：

numpy =＆＃39; 1.13.3＆＃39;，

Python =＆＃39; 3.6.4 | Anaconda，Inc。| （默认，2018年1月16日，12：04：33）\ n [GCC 4.2.1兼容Clang 4.0.1（标签/ RELEASE_401 / final）]＆＃39;）

Answer 1

示例中未随机抽取的初始随机状态副本实际上会产生相同的随机数序列（已在python 3.6，numpy 1.15.4上进行了测试）。 @jasonharper指出，可能没有为RandomState实施平等测试。 ==返回False，但状态在行为上是相同的。

在您所提供的代码之后插入以下代码片段：

a = random_state.randint(0, 10, 5)
b = random_state_copy.randint(0, 10, 5)
print(a)
print(b)
print(a==b)

产生：

[5 0 3 3 7]
[5 0 3 3 7]
[ True  True  True  True  True]

因此，很可能RandomState不会使运行结果与众不同：在其他地方寻找导致差异的原因。

为什么序列化的numpy random_state对象在加载时会有所不同？

1 个答案: