如何与jupyter和sklearn并行化?

时间:2017-01-14 10:53:48

标签: python scikit-learn jupyter-notebook jupyter ipython-parallel

我正在尝试并行GridSearchCV scikit-learn。它在jupyter (hub) notebook环境中运行。经过一番研究后,我发现了这段代码:

from sklearn.externals.joblib import Parallel, parallel_backend, register_parallel_backend
from ipyparallel import Client
from ipyparallel.joblib import IPythonParallelBackend

c = Client(profile='myprofile')
print(c.ids)
bview = c.load_balanced_view()

register_parallel_backend('ipyparallel', lambda : IPythonParallelBackend(view=bview))

grid = GridSearchCV(pipeline, cv=3, n_jobs=4, param_grid=param_grid)

with parallel_backend('ipyparallel'):
    grid.fit(X_train, Y_train)

请注意,我已将n_jobs参数设置为4,机器的cpu核心数是多少。 (这是nproc返回的内容)

但它似乎不起作用:ImportError: cannot import name 'register_parallel_backend',虽然我使用conda install joblib安装了joblib并尝试了pip install -U joblib

那么,在这种环境中并行化GridSearchCV的最佳方法是什么?

更新

不使用ipyparallel并只设置n_jobs参数:

grid = GridSearchCV(pipeline, cv=3, n_jobs=4, param_grid=param_grid)
grid.fit(X_train, Y_train)

结果是以下警告消息:

/opt/conda/lib/python3.5/site-  packages/sklearn/externals/joblib/parallel.py:540: UserWarning:

Multiprocessing-backed parallel loops cannot be nested, setting n_jobs=1

似乎它最终是顺序执行而不是并行执行。

0 个答案:

没有答案