Scikit-learn GradientBoostingClassifier random_state无效

时间:2017-06-01 06:02:58

标签: syntax parameters scikit-learn classification boosting

所以我在sklearn中使用不同的分类器,发现无论random_state参数GradientBoostingClassifier的值如何,它总是返回相同的值。例如,当我运行以下代码时:

import numpy as np
from sklearn.ensemble import GradientBoostingClassifier
from sklearn import datasets
from sklearn.model_selection import train_test_split
iris = datasets.load_iris()
X = iris.data[:, :2]  
y = iris.target
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size =0.2, 
random_state=0)

scores = []
for i in range(10):
    clf = GradientBoostingClassifier(random_state=i).fit(X_train, y_train)
    score = clf.score(X_test,y_test)
    scores = np.append(scores, score)
print scores

输出是:

[ 0.66666667  0.66666667  0.66666667  0.66666667  0.66666667  0.66666667
0.66666667  0.66666667  0.66666667  0.66666667]

但是,当我用另一个分类器运行相同的东西时,比如RandomForest:

from sklearn.ensemble import RandomForestClassifier
scores = []
for i in range(10):
    clf = RandomForestClassifier(random_state=i).fit(X_train, y_train)
    score = clf.score(X_test,y_test)
    scores = np.append(scores, score)
print scores

输出是你所期望的,即略有变化:

[ 0.6         0.56666667  0.63333333  0.76666667  0.6         0.63333333
0.66666667  0.56666667  0.66666667  0.53333333]

什么可能导致GradientBoostingClassifier忽略随机状态?我检查了分类器信息,但一切看起来都很正常:

print clf
GradientBoostingClassifier(criterion='friedman_mse', init=None,
          learning_rate=0.1, loss='deviance', max_depth=3,
          max_features=None, max_leaf_nodes=None,
          min_impurity_split=1e-07, min_samples_leaf=1,
          min_samples_split=2, min_weight_fraction_leaf=0.0,
          n_estimators=100, presort='auto', random_state=9,
          subsample=1.0, verbose=0, warm_start=False)

我尝试使用warm_start和presort,但它并没有改变任何东西。有任何想法吗?我已经试着弄清楚了近一个小时,所以我想我会问这里。感谢您的时间!

0 个答案:

没有答案