您好我正在尝试将VotingClassifier与我的GradientBoostingClassifier一起使用,我将一个包装器用于使用sample_weight。 但是,我得到了以下错误,无法弄清楚如何解决它。
代码:
class MyGradientBoostingClassifier(GradientBoostingClassifier):
def fit(self, X , y=None):
return super(GradientBoostingClassifier, self).fit(X, y, sample_weight=y)
rf = RandomForestClassifier(n_jobs=-1)
mygb = MyGradientBoostingClassifier()
vc = VotingClassifier(estimators=[('rf', rf), ('mygb', mygb)],
voting='soft',
weights=[1,2])
mygb.fit(X5, y5)
y的样本是[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 1. 0. 0. 0. 0.]
,它是np数组
错误:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-62-c56d4cac146f> in <module>()
13 weights=[1,2])
14
---> 15 mygb.fit(X5, y5)
<ipython-input-62-c56d4cac146f> in fit(self, X, y)
3 print np.shape(y), np.shape(X), Counter(y), type(y)
4 print y[:20]
----> 5 return super(GradientBoostingClassifier, self).fit(X, y, sample_weight=y)
6
7
/Users/a/anaconda/lib/python2.7/site-packages/sklearn/ensemble/gradient_boosting.pyc in fit(self, X, y, sample_weight, monitor)
987
988 # fit initial model - FIXME make sample_weight optional
--> 989 self.init_.fit(X, y, sample_weight)
990
991 # init predictions
/Users/a/anaconda/lib/python2.7/site-packages/sklearn/ensemble/gradient_boosting.pyc in fit(self, X, y, sample_weight)
117
118 if neg == 0 or pos == 0:
--> 119 raise ValueError('y contains non binary labels.')
120 self.prior = self.scale * np.log(pos / neg)
121
ValueError: y contains non binary labels.
答案 0 :(得分:1)
对于分类模型y
应该是整数类标签(0和1),因此将它用作分类目标和样本权重是没有意义的。
所有具有0权重的样本都被模型忽略,并且不可能仅使用来自训练集的同一类的样本来训练二元分类模型。