我正在对MNIST数据集进行快速分类项目,只是为了进行一些时间试验并使用各种分类器进行测试。
我感兴趣的其中一个分类是AdaBoost。我想改变n_est
参数,该参数控制算法中使用的弱估计器的数量。
当我将估算值从n_est=100
增加到n_est=200
时,测试集精度实际上会降低,这不是我期望的行为。有什么理由会发生这种情况吗?每次都将测试集和训练集分区相同。集合是不相交和完整的。
这是我的代码:
# Documentation:
# AdaBoostClassifier(base_estimator=None, n_estimators=50, learning_rate=1.0, algorithm='SAMME.R', random_state=None)
# base_estimator - this is what the ensemble is built off of
# n_estimators - the max number of estimators at which boosting is terminated
# if it's a perfect fit, then we stop early
classifier = skl_ensemble.AdaBoostClassifier(base_estimator = None, n_estimators = n_est)
# Make a header
print_pretty('AdaBoost Classifier Model')
# Train
print('Please wait. Training classifier...')
start = time.time()
score1 = classifier.fit(image_array, label_array).score(image_array, label_array)
end = time.time()
total_time = end - start
# Show results
print('\tClassifier has been trained in time: {0:.3f} seconds with n_est = {1}'.format(total_time, n_est))
print('\tClassifier training accuracy: {0:.3f}%'.format(score1*100))
代码重现起来相对简单。我99%肯定在代码中不是很大。我创建了两个不同的分类器,并在同一个源上独立训练它们。
有谁知道为什么这种行为会合乎逻辑?这是其中一个运行的输出:
========================================
AdaBoost Classifier Model
========================================
Please wait. The AdaBoost classifier is being trained...
Classifier has been trained in time: 214.000 seconds with n_est = 100
Classifier training accuracy: 73.237% (should be the same as: 73.237%)
Now predicting with test set...
Test set prediction time: 1.235 seconds
Test set accuracy: 72.960%
Average cross val score: 0.612%
========================================
AdaBoost Classifier Model
========================================
Please wait. The AdaBoost classifier is being trained...
Classifier has been trained in time: 505.418 seconds with n_est = 200
Classifier training accuracy: 72.188% (should be the same as: 72.188%)
Now predicting with test set...
Test set prediction time: 1.643 seconds
Test set accuracy: 72.060%
Average cross val score: 0.566%
谢谢!