我尝试在scikit-learn中使用GradientBoostingClassifier,它的默认参数工作正常。但是,当我尝试用不同的分类器替换BaseEstimator时,它不起作用并且给了我以下错误,
return y - np.nan_to_num(np.exp(pred[:, k] -
IndexError: too many indices
你有解决问题的方法吗?
可以使用以下代码段重新生成此错误:
import numpy as np
from sklearn import datasets
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.utils import shuffle
mnist = datasets.fetch_mldata('MNIST original')
X, y = shuffle(mnist.data, mnist.target, random_state=13)
X = X.astype(np.float32)
offset = int(X.shape[0] * 0.01)
X_train, y_train = X[:offset], y[:offset]
X_test, y_test = X[offset:], y[offset:]
### works fine when init is None
clf_init = None
print 'Train with clf_init = None'
clf = GradientBoostingClassifier( (loss='deviance', learning_rate=0.1,
n_estimators=5, subsample=0.3,
min_samples_split=2,
min_samples_leaf=1,
max_depth=3,
init=clf_init,
random_state=None,
max_features=None,
verbose=2,
learn_rate=None)
clf.fit(X_train, y_train)
print 'Train with clf_init = None is done :-)'
print 'Train LogisticRegression()'
clf_init = LogisticRegression();
clf_init.fit(X_train, y_train);
print 'Train LogisticRegression() is done'
print 'Train with clf_init = LogisticRegression()'
clf = GradientBoostingClassifier(loss='deviance', learning_rate=0.1,
n_estimators=5, subsample=0.3,
min_samples_split=2,
min_samples_leaf=1,
max_depth=3,
init=clf_init,
random_state=None,
max_features=None,
verbose=2,
learn_rate=None)
clf.fit(X_train, y_train) # <------ ERROR!!!!
print 'Train with clf_init = LogisticRegression() is done'
这是错误的完整追溯:
Traceback (most recent call last):
File "/home/mohsena/Dropbox/programing/gbm/gb_with_init.py", line 56, in <module>
clf.fit(X_train, y_train)
File "/usr/local/lib/python2.7/dist-packages/sklearn/ensemble/gradient_boosting.py", line 862, in fit
return super(GradientBoostingClassifier, self).fit(X, y)
File "/usr/local/lib/python2.7/dist-packages/sklearn/ensemble/gradient_boosting.py", line 614, in fit random_state)
File "/usr/local/lib/python2.7/dist-packages/sklearn/ensemble/gradient_boosting.py", line 475, in _fit_stage
residual = loss.negative_gradient(y, y_pred, k=k)
File "/usr/local/lib/python2.7/dist-packages/sklearn/ensemble/gradient_boosting.py", line 404, in negative_gradient
return y - np.nan_to_num(np.exp(pred[:, k] -
IndexError: too many indices
答案 0 :(得分:9)
iampat答案的改进版本以及对scikit-developers答案的轻微修改应该可以解决问题。
class init:
def __init__(self, est):
self.est = est
def predict(self, X):
return self.est.predict_proba(X)[:,1][:,numpy.newaxis]
def fit(self, X, y):
self.est.fit(X, y)
答案 1 :(得分:5)
正如scikit-learn开发人员所建议的那样,问题可以通过使用这样的适配器来解决:
def __init__(self, est):
self.est = est
def predict(self, X):
return self.est.predict_proba(X)[:, 1]
def fit(self, X, y):
self.est.fit(X, y)
答案 2 :(得分:5)
这是一个完整的,在我看来,更简单的iampat代码片段版本。
class RandomForestClassifier_compability(RandomForestClassifier):
def predict(self, X):
return self.predict_proba(X)[:, 1][:,numpy.newaxis]
base_estimator = RandomForestClassifier_compability()
classifier = GradientBoostingClassifier(init=base_estimator)
答案 3 :(得分:4)
渐变提升通常要求基础学习者是执行数字预测而不是分类的算法。我认为这是你的问题。