据我了解,GridSearchCV
应根据平均cross_val_score()
评估找到模型的最佳参数(请参阅best_score_ result field here的说明)。
然而,我得到一个看起来不是最佳的max_depth
。比方说,如果我手动传递max_depth=2
,我会得到一个更好的结果,然后GridSearchCV返回结果2
在max_depth
网格列表中。但是,GridSearchCV发现8
优于2
max_depth
。
当我手动测试找到的'best'max_depth参数的结果时,我发现它并不是最好的。
拜托,你能解释一下为什么GridSearchCV找到参数'精确'的结果会比cross_val_score
第一个任意参数更差吗?
有关详细信息,请查看以下代码。
更新。我更新了为RandomForestClassifier
添加参数random_state的代码(正如Nain告诉我的那样)。现在分数非常接近但不完全相同。为什么他们不一样?..
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import GridSearchCV
DATA_LEN = 500
NUM_CLASSES = 3
NUM_VARS = 20
np.random.seed(0)
# data generation (some feature rows are equal to target y, some of these rows are random)
y = np.random.choice(NUM_CLASSES, DATA_LEN)
xs = []
for j in range(DATA_LEN):
row = []
for i in range(NUM_VARS):
rand_int = np.random.choice(NUM_CLASSES)
if j < int(DATA_LEN / (i + 2)):
row.append(y[j])
else:
row.append(rand_int)
xs.append(row)
X = np.array(xs).reshape(DATA_LEN, -1)
# predict: cross_val_score with max_depth=2 and max_depth=8 (the last one is found by GridSearchCV)
np.random.seed(0)
clf = RandomForestClassifier(n_estimators=100, max_depth=2, random_state=0)
clf.fit(X, y)
np.random.seed(0)
# average accuracy is '0.579718326935'
print np.average(cross_val_score(clf, X, y, cv=5, scoring='accuracy'))
np.random.seed(0)
clf = RandomForestClassifier(n_estimators=100, max_depth=5, random_state=0)
clf.fit(X, y)
np.random.seed(0)
# average accuracy is '0.596127702566' which is not exactly the same as best score '0.59799999999999998'
print np.average(cross_val_score(clf, X, y, cv=5, scoring='accuracy'))
grid_params = {'max_depth': range(2, 20)}
clf = RandomForestClassifier(n_estimators=100, random_state=0)
np.random.seed(0)
clf_searcher = GridSearchCV(clf, grid_params, scoring='accuracy', cv=5)
clf_searcher.fit(X, y)
# outputs best score '0.59799999999999998'
print 'best_score=', clf_searcher.best_score_
# outputs "best_params= {'max_depth': 5}""
print 'best_params=', clf_searcher.best_params_