Sklearn train_test_split()使用python 3.7分层奇怪的行为

时间:2019-03-19 10:23:52

标签: python python-3.x numpy scikit-learn

我使用了70到30个平衡的数据集,并尝试使用train_test_split sklearn函数进行分层训练/测试。 它可以在python 3.5中按预期工作,但在3.7中却不能。

有一些我正在复制的代码:

import numpy as np
from sklearn.model_selection import train_test_split

data = np.random.rand(1000000).reshape(100000, 10)

y_0 = [0]*30000
y_1 = [1]*70000
y_2 = y_0 + y_1

x_train, x_test, y_train, y_test = train_test_split(data, y_2, test_size=0.2, random_state=0, stratify=y_2)

print('Train set size : {}'.format(len(y_train)))
print('Value 1 repartition in train set : {}'.format(sum(y_train)/len(y_train)))
print('Test set size : {}'.format(len(y_test)))
print('Value 1 repartition in test set : {}'.format(sum(y_test)/len(y_test)))

输出Python 3.7:

Train set size : 24102
Value 1 repartition in train set : 0.5414903327524687
Test set size : 20000
Value 1 repartition in test set : 0.53775

输出Python 3.5:

Train set size : 80000
Value 1 repartition in train set : 0.7
Test set size : 20000
Value 1 repartition in test set : 0.7

库3.7版:

Python 3.7.2
    numpy==1.16.1
    pandas==0.24.1
    python-dateutil==2.8.0
    pytz==2018.9
    scikit-learn==0.20.2
    scipy==1.2.1
    six==1.12.0

库版本3.5:

Python 3.5.1
    numpy==1.16.1
    pandas==0.24.1
    python-dateutil==2.8.0
    pytz==2018.9
    scikit-learn==0.20.2
    scipy==1.2.1
    six==1.12.0

1 个答案:

答案 0 :(得分:0)

此类问题可能与处理器体系结构有关。 请检查两个Python版本是否具有相同的体系结构(32位或64位)。