我正在通过cross_val_score
使用交叉验证运行Scikit-learn管道。
但是经过几次运行后,每次折叠的结果总是一样的。我对此感到困扰,因为分裂不应该是随机的吗?
这是我的代码的相关部分:
pipeline = Pipeline([
('vect', CountVectorizer(preprocessor=clean_text_custom, max_features=MAX_NB_WORDS, strip_accents='unicode')),
('tfidf', TfidfTransformer()),
('clf', OneVsRestClassifier(LinearSVC(),n_jobs=-1)),
])
cross_val_score(pipeline, data, binary_label_data, cv=5,scoring='f1_micro')
# array([ 0.25129587, 0.37780563, 0.33195376, 0.31269861, 0.14555337])
# then i run it again and I get he exact same scores for each fold
cross_val_score(pipeline, data, binary_label_data, cv=5,scoring='f1_micro')
# array([ 0.25129587, 0.37780563, 0.33195376, 0.31269861, 0.14555337])