我尝试将AdaBoostClassifier与决策树桩一起用作基本分类器。我注意到AdaBoostClassifier完成的重量调整给了我SAMME.R和SAMME选项的错误。
这里简要介绍了我正在做的事情
def train_adaboost(features, labels):
uniqLabels = np.unique(labels)
allLearners = []
for targetLab in uniqLabels:
runs=[]
for rrr in xrange(10):
feats,labs = get_binary_sets(features, labels, targetLab)
baseClf = DecisionTreeClassifier(max_depth=1, min_samples_leaf=1)
baseClf.fit(feats, labs)
ada_real = AdaBoostClassifier( base_estimator=baseClf,
learning_rate=1,
n_estimators=20,
algorithm="SAMME")
runs.append(ada_real.fit(feats, labs))
allLearners.append(runs)
return allLearners
我查看了每个决策树分类器的拟合情况,他们能够预测一些标签。 然而,当我使用这个基础分类器查看AdaBoostClassifier时,我得到了关于权重提升算法的错误。
def compute_confidence(allLearners, dada, labbo):
for ii,thisLab in enumerate(allLearners):
for jj, thisLearner in enumerate(thisLab):
#accessing thisLearner's methods here
这些方法会产生如下错误:
ipdb> thisLearner.predict_proba(myData)
PATHTOPACKAGE/lib/python2.7/site-packages/sklearn/ensemble/weight_boosting.py:727: RuntimeWarning: invalid value encountered in double_scalars
proba /= self.estimator_weights_.sum()
*** ValueError: 'axis' entry is out of bounds
ipdb> thisLearner.predict(myData)
PATHTOPACKAGE/lib/python2.7/site-packages/sklearn/ensemble/weight_boosting.py:639: RuntimeWarning: invalid value encountered in double_scalars
pred /= self.estimator_weights_.sum()
*** IndexError: 0-d arrays can only use a single () or a list of newaxes (and a single ...) as an index
我为adaboost尝试了SAMME.R算法,但由于此错误,我甚至无法适应adaboost
[...]
File "PATH/sklearn/ensemble/weight_boosting.py", line 388, in fit
return super(AdaBoostClassifier, self).fit(X, y, sample_weight)
File "PATH/sklearn/ensemble/weight_boosting.py", line 124, in fit
X_argsorted=X_argsorted)
File "PATH/sklearn/ensemble/weight_boosting.py", line 435, in _boost
X_argsorted=X_argsorted)
File "PATH/sklearn/ensemble/weight_boosting.py", line 498, in _boost_real
(estimator_weight < 0)))
ValueError: non-broadcastable output operand with shape (1000) doesn't match the broadcast shape (1000,1000)
数据的维度实际上与分类器期望的格式兼容,无论是在使用adaboost之前还是在我尝试测试训练的分类器时。这些错误表明了什么?
答案 0 :(得分:1)
这有点违反直觉,来自Matlab编码。
显然,实验室的尺寸是问题,即(1000,1)。它需要是(1000,)
添加此行解决了它:
labs = labs [:,0]