Question

我一直在使用sklearn中来自StratifiedKFold的随机状态变量，但它似乎并不是随机的。我认为设置random_state=5，应该给我一个不同的测试集然后设置random_state=4，但似乎并非如此。我在下面创建了一些粗略的可重现代码。首先我加载我的数据：

import numpy as np
from sklearn.cross_validation import StratifiedKFold
from sklearn import datasets
iris = datasets.load_iris()
X = iris.data
y = iris.target

然后我设置random_state=5，我存储了最后的值：

skf=StratifiedKFold(n_splits=5,random_state=5)
for (train, test) in skf.split(X,y): full_test_1=test
full_test_1

array([ 40,  41,  42,  43,  44,  45,  46,  47,  48,  49,  90,  91,  92,
        93,  94,  95,  96,  97,  98,  99, 140, 141, 142, 143, 144, 145,
       146, 147, 148, 149])

对random_state=4执行相同的操作：

skf=StratifiedKFold(n_splits=5,random_state=4)
for (train, test) in skf.split(X,y): full_test_2=test
full_test_2

array([ 40,  41,  42,  43,  44,  45,  46,  47,  48,  49,  90,  91,  92,
        93,  94,  95,  96,  97,  98,  99, 140, 141, 142, 143, 144, 145,
       146, 147, 148, 149])

然后我可以检查它们是否相等：

np.array_equal(full_test_1,full_test_2)
True

我不认为两个随机状态应该返回相同的数字。我的逻辑或代码有缺陷吗？

Answer 1

来自链接的文档

random_state：None，int或RandomState

当shuffle = True 时，用于混洗的伪随机数生成器状态。如果为None，则使用默认numpy RNG进行混洗。

你在调用StratifiedKFold时没有设置shuffle = True，所以random_state不会做任何事情。

sklearn随机状态不是随机的

1 个答案: