为什么cross_val_score总是为每个折叠报告相同的分数?

时间:2017-10-12 21:04:23

标签: machine-learning scikit-learn

我正在通过cross_val_score使用交叉验证运行Scikit-learn管道。

但是经过几次运行后,每次折叠的结果总是一样的。我对此感到困扰,因为分裂不应该是随机的吗?

这是我的代码的相关部分:

pipeline = Pipeline([
    ('vect', CountVectorizer(preprocessor=clean_text_custom, max_features=MAX_NB_WORDS, strip_accents='unicode')),
    ('tfidf', TfidfTransformer()),
    ('clf', OneVsRestClassifier(LinearSVC(),n_jobs=-1)),
])

cross_val_score(pipeline, data, binary_label_data, cv=5,scoring='f1_micro')
# array([ 0.25129587,  0.37780563,  0.33195376,  0.31269861,  0.14555337])

# then i run it again and I get he exact same scores for each fold
cross_val_score(pipeline, data, binary_label_data, cv=5,scoring='f1_micro')
# array([ 0.25129587,  0.37780563,  0.33195376,  0.31269861,  0.14555337])

0 个答案:

没有答案