GridSearchCV没有找到随机森林的最佳参数

时间:2018-01-30 17:33:35

标签: python numpy machine-learning scikit-learn random-forest

据我了解,GridSearchCV应根据平均cross_val_score()评估找到模型的最佳参数(请参阅best_score_ result field here的说明)。

然而,我得到一个看起来不是最佳的max_depth。比方说,如果我手动传递max_depth=2,我会得到一个更好的结果,然后GridSearchCV返回结果2max_depth网格列表中。但是,GridSearchCV发现8优于2 max_depth

当我手动测试找到的'best'max_depth参数的结果时,我发现它并不是最好的。

拜托,你能解释一下为什么GridSearchCV找到参数'精确'的结果会比cross_val_score第一个任意参数更差吗?

有关详细信息,请查看以下代码。

更新。我更新了为RandomForestClassifier添加参数random_state的代码(正如Nain告诉我的那样)。现在分数非常接近但不完全相同。为什么他们不一样?..

import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import GridSearchCV

DATA_LEN = 500
NUM_CLASSES = 3
NUM_VARS = 20

np.random.seed(0)

# data generation (some feature rows are equal to target y, some of these rows are random)

y = np.random.choice(NUM_CLASSES, DATA_LEN)

xs = []

for j in range(DATA_LEN):
    row = []
    for i in range(NUM_VARS):       
        rand_int = np.random.choice(NUM_CLASSES)

        if j < int(DATA_LEN / (i + 2)):
            row.append(y[j])
        else:
            row.append(rand_int)

    xs.append(row)

X = np.array(xs).reshape(DATA_LEN, -1)

# predict: cross_val_score with max_depth=2 and max_depth=8 (the last one is found by GridSearchCV)

np.random.seed(0)
clf = RandomForestClassifier(n_estimators=100, max_depth=2, random_state=0)
clf.fit(X, y)
np.random.seed(0)
# average accuracy is '0.579718326935'
print np.average(cross_val_score(clf, X, y, cv=5, scoring='accuracy')) 

np.random.seed(0)
clf = RandomForestClassifier(n_estimators=100, max_depth=5, random_state=0) 
clf.fit(X, y)
np.random.seed(0)
# average accuracy is '0.596127702566' which is not exactly the same as best score '0.59799999999999998'
print np.average(cross_val_score(clf, X, y, cv=5, scoring='accuracy'))

grid_params = {'max_depth': range(2, 20)}
clf = RandomForestClassifier(n_estimators=100, random_state=0)

np.random.seed(0)
clf_searcher = GridSearchCV(clf, grid_params, scoring='accuracy', cv=5)
clf_searcher.fit(X, y)

# outputs best score '0.59799999999999998'
print 'best_score=', clf_searcher.best_score_
# outputs "best_params= {'max_depth': 5}""  
print 'best_params=', clf_searcher.best_params_

0 个答案:

没有答案