Scikit学习准确性的偏差

时间:2017-12-14 09:56:56

标签: scikit-learn classification

  

我正在使用scikit-learn集成分类器进行分类。我有单独的训练和测试数据集。当我使用相同的数据集并使用机器学习算法进行分类时,我得到了一致的准确性。不一致只在集合分类器的情况下。我甚至将random_state设置为0.

bag_classifier = BaggingClassifier(n_estimators=10,random_state=0)
bag_classifier.fit(train_arrays,train_labels)   
bag_predict = bag_classifier.predict(test_arrays)  
bag_accuracy = bag_classifier.score(test_arrays,test_labels)   
bag_cm = confusion_matrix(test_labels,bag_predict)   
print("The Bagging Classifier accuracy is : " ,bag_accuracy)   
print("The Confusion Matrix is ")  
print(bag_cm)

1 个答案:

答案 0 :(得分:0)

You will normally find different results for same model because every time when the model is executed during training, the train/test split is random. You can reproduce the same results by giving the seed value to the train/test split.

train, test = train_test_split(your data , test_size=0.3,  random_state=57)

Keep the same random_state value in each turn of training.