GridSearchCV会为此运行多少个组合?

时间:2018-03-14 16:43:52

标签: machine-learning scikit-learn random-forest grid-search

使用sklearn在随机林分类器上运行网格搜索。这比我想象的要运行的时间更长,我想估计这个过程需要多长时间。我认为它会做的总数是3 * 3 * 3 * 3 * 5 = 405.

clf = RandomForestClassifier(n_jobs=-1, oob_score=True, verbose=1)
param_grid = {'n_estimators':[50,200,500],
'max_depth':[2,3,5],
'min_samples_leaf':[1,2,5],
'max_features': ['auto','log2','sqrt']
}

gscv = GridSearchCV(estimator=clf,param_grid=param_grid,cv=5)
gscv.fit(X.values,y.values.reshape(-1,))

从输出中,我看到它循环执行每个集合是估算器数量的任务:

[Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 1.2min
[Parallel(n_jobs=-1)]: Done 184 tasks | elapsed: 5.3min
[Parallel(n_jobs=-1)]: Done 200 out of 200 tasks | elapsed: 6.2min finished
[Parallel(n_jobs=8)]: Done 34 tasks | elapsed: 0.5s
[Parallel(n_jobs=8)]: Done 184 tasks | elapsed: 3.0s
[Parallel(n_jobs=8)]: Done 200 tasks out of 200 tasks | elapsed: 3.2s finished
[Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 1.1min
[Parallel(n_jobs=-1)]: Done 50 tasks out of 50 tasks | elapsed: 1.5min finished
[Parallel(n_jobs=8)]: Done 34 tasks | elapsed: 0.5s
[Parallel(n_jobs=8)]: Done 50 out of 50 tasks | elapsed: 0.8s finished

我计算了#34;已完成"现在是680。我以为它会在405完成。我的计算错了吗?

1 个答案:

答案 0 :(得分:2)

您的计算似乎是正确的:网格数是不同参数的组合产品,在本例中为81:

tasks

在每个中,您有五个交叉验证,总共405个。verbose完全是一个单独的指示。

BaseForest获取passed through到父类{{1}},然后获取到joblib的Parallel

我不确定在这种情况下构成任务的是什么,但是顶级网格列组合的数量应该是405.请记住,每个组合都是树木的集合。