更多弱分类器在Sklearn Adaboost中产生较低的准确度

时间:2017-04-24 04:06:28

标签: python python-2.7 machine-learning scikit-learn adaboost

我正在对MNIST数据集进行快速分类项目,只是为了进行一些时间试验并使用各种分类器进行测试。

我感兴趣的其中一个分类是AdaBoost。我想改变n_est参数,该参数控制算法中使用的弱估计器的数量。

当我将估算值从n_est=100增加到n_est=200时,测试集精度实际上会降低,这不是我期望的行为。有什么理由会发生这种情况吗?每次都将测试集和训练集分区相同。集合是不相交和完整的。

这是我的代码:

# Documentation:
# AdaBoostClassifier(base_estimator=None, n_estimators=50, learning_rate=1.0, algorithm='SAMME.R', random_state=None)
# base_estimator - this is what the ensemble is built off of
# n_estimators - the max number of estimators at which boosting is terminated
#                if it's a perfect fit, then we stop early
classifier = skl_ensemble.AdaBoostClassifier(base_estimator = None, n_estimators = n_est)

# Make a header
print_pretty('AdaBoost Classifier Model')

# Train
print('Please wait. Training classifier...')
start = time.time()
score1 = classifier.fit(image_array, label_array).score(image_array, label_array)
end = time.time()
total_time = end - start

# Show results
print('\tClassifier has been trained in time: {0:.3f} seconds with n_est = {1}'.format(total_time, n_est))
print('\tClassifier training accuracy: {0:.3f}%'.format(score1*100))

代码重现起来相对简单。我99%肯定在代码中不是很大。我创建了两个不同的分类器,并在同一个源上独立训练它们。

有谁知道为什么这种行为会合乎逻辑?这是其中一个运行的输出:

========================================
         AdaBoost Classifier Model       
========================================
Please wait. The AdaBoost classifier is being trained...
Classifier has been trained in time: 214.000 seconds with n_est = 100
Classifier training accuracy: 73.237% (should be the same as: 73.237%)
Now predicting with test set...
Test set prediction time: 1.235 seconds
Test set accuracy: 72.960%
Average cross val score: 0.612%
========================================
       AdaBoost Classifier Model       
========================================
Please wait. The AdaBoost classifier is being trained...
Classifier has been trained in time: 505.418 seconds with n_est = 200
Classifier training accuracy: 72.188% (should be the same as: 72.188%)
Now predicting with test set...
Test set prediction time: 1.643 seconds
Test set accuracy: 72.060%
Average cross val score: 0.566%

谢谢!

0 个答案:

没有答案