无法与KNeighborsClassifier并行化

时间:2018-10-21 22:20:33

标签: python-3.x scikit-learn jupyter-notebook joblib

我正在尝试在MNIST上训练和交叉验证sklearn的KNeighborsClassifier。如果有n_jobs=None,我就可以拟合模型,但是只要将模型放在并行环境中(例如RandomizedSearchCVcross_val_scoren_jobs=-1),就没有进度制造。我已经在Jupyter笔记本和常规Python 3.6会话中进行了尝试。 RandomForestClassifier加上RandomizedSearchCV的效果与预期一样。

有时候,我可以使用ctrl+C或通过中断内核来中断该过程,而有时候我无法做到这一点,需要实际SIGKILL进行该过程。

以下代码有效:

knn_clf = KNeighborsClassifier()
knn_clf.fit(X, y)

但是即使n_jobs尚未设置,它也将挂起并且永远不会完成:

cross_val_score(knn_clf, X, y,
  cv=3,
  scoring="accuracy",
  verbose=3
)

我正在Ubuntu 18.04.1上使用以下conda环境运行Anaconda Python 3.6.6:

# packages in environment at /home/user/anaconda3/envs/ml:
#
# Name                    Version                   Build  Channel
absl-py                   0.5.0                     <pip>
asn1crypto                0.24.0                   py36_0  
astor                     0.7.1                     <pip>
backcall                  0.1.0                    py36_0  
backports                 1.0                      py36_1  
backports.functools_lru_cache 1.5                        py_1    conda-forge
bcrypt                    3.1.4            py36h14c3975_0  
blas                      1.0                    openblas  
bleach                    3.0.2                    py36_0  
bokeh                     0.13.0                   py36_0  
bzip2                     1.0.6                h14c3975_5  
ca-certificates           2018.03.07                    0  
certifi                   2018.10.15               py36_0  
cffi                      1.11.5           py36he75722e_1  
click                     7.0                      py36_0  
cloudpickle               0.6.1                    py36_0  
cryptography              2.3.1            py36hc365091_0  
cryptography-vectors      2.3.1                    py36_0  
cycler                    0.10.0                   py36_0  
cytoolz                   0.9.0.1          py36h14c3975_1  
dask                      0.19.4                   py36_0  
dask-core                 0.19.4                   py36_0  
dask-glm                  0.1.0                    py36_0  
dask-ml                   0.10.0                   py36_0  
dbus                      1.13.2               h714fa37_1  
decorator                 4.3.0                    py36_0  
distributed               1.23.3                   py36_0  
entrypoints               0.2.3                    py36_2  
eventlet                  0.23.0                   py36_0    conda-forge
expat                     2.2.6                he6710b0_0  
ffmpeg                    4.0                  hcdf2ecd_0  
flask                     1.0.2                    py36_1  
flask-socketio            3.0.2                    py36_0  
fontconfig                2.13.0               h9420a91_0  
freetype                  2.9.1                h8a8886c_1  
futures-compat            1.0                       py3_0  
gast                      0.2.0                     <pip>
gettext                   0.19.8.1             hd7bead4_3  
glib                      2.56.2               hd408876_0  
gmp                       6.1.2                h6c8ec71_1  
greenlet                  0.4.15           py36h7b6447c_0  
grpcio                    1.15.0                    <pip>
gst-plugins-base          1.14.0               hbbd80ab_1  
gstreamer                 1.14.0               hb453b48_1  
h5py                      2.8.0            py36h989c5e5_3  
hdf5                      1.10.2               hba1933b_1  
heapdict                  1.0.0                    py36_2  
html5lib                  1.0.1                    py36_0  
icu                       58.2                 h9c2bf20_1  
idna                      2.7                      py36_0  
imageio                   2.4.1                    py36_0  
intel-openmp              2019.0                      118  
ipykernel                 5.1.0            py36h39e3cac_0  
ipython                   7.0.1            py36h39e3cac_0  
ipython_genutils          0.2.0                    py36_0  
ipywidgets                7.4.2                    py36_0  
itsdangerous              0.24                     py36_1  
jedi                      0.13.1                   py36_0  
jinja2                    2.10                     py36_0  
joblib                    0.12.5                   py36_0  
jpeg                      9b                   h024ee3a_2  
jsonschema                2.6.0                    py36_0  
jupyter                   1.0.0                    py36_7  
jupyter_client            5.2.3                    py36_0  
jupyter_console           6.0.0                    py36_0  
jupyter_core              4.4.0                    py36_0  
Keras                     2.2.2                     <pip>
Keras-Applications        1.0.5                     <pip>
Keras-Preprocessing       1.0.3                     <pip>
kiwisolver                1.0.1            py36hf484d3e_0  
libedit                   3.1.20170329         h6b74fdf_2  
libffi                    3.2.1                hd88cf55_4  
libgcc-ng                 8.2.0                hdf63c60_1  
libgfortran               3.0.0                         1    conda-forge
libgfortran-ng            7.3.0                hdf63c60_0  
libiconv                  1.15                 h63c8f33_5  
libopenblas               0.3.3                h5a2b251_3  
libopus                   1.2.1                hb9ed12e_0  
libpng                    1.6.35               hbc83047_0  
libsodium                 1.0.16               h1bed415_0  
libstdcxx-ng              8.2.0                hdf63c60_1  
libtiff                   4.0.9                he85c1e1_2  
libuuid                   1.0.3                h1bed415_2  
libvpx                    1.7.0                h439df22_0  
libxcb                    1.13                 h1bed415_1  
libxml2                   2.9.8                h26e45fe_1  
llvmlite                  0.25.0           py36hd408876_0  
locket                    0.2.0                    py36_1  
Markdown                  3.0.1                     <pip>
markupsafe                1.0              py36h14c3975_1  
matplotlib                3.0.0            py36h5429711_0  
mistune                   0.8.4            py36h7b6447c_0  
mkl                       2019.0                      118  
mkl_fft                   1.0.1            py36h3010b51_0  
mkl_random                1.0.1            py36h629b387_0  
moviepy                   0.2.3.5                   <pip>
msgpack-python            0.5.6            py36h6bb024c_1  
multipledispatch          0.6.0                    py36_0  
nbconvert                 5.3.1                    py36_0  
nbformat                  4.4.0                    py36_0  
ncurses                   6.1                  hf484d3e_0  
networkx                  2.2                      py36_1  
nomkl                     3.0                           0  
notebook                  5.7.0                    py36_0  
numba                     0.40.0           py36h962f231_0  
numpy                     1.15.2           py36h99e49ec_1  
numpy-base                1.15.2           py36h2f8d375_1  
olefile                   0.46                     py36_0  
openblas                  0.3.3                         3  
openblas-devel            0.3.3                         3  
opencv3                   3.1.0                    py36_0    menpo
openssl                   1.0.2p               h14c3975_0  
packaging                 18.0                     py36_0  
pandas                    0.23.4           py36h04863e7_0  
pandoc                    2.2.3.2                       0  
pandocfilters             1.4.2                    py36_1  
paramiko                  2.4.2                    py36_0  
parso                     0.3.1                    py36_0  
partd                     0.3.9                    py36_0  
patsy                     0.5.0                    py36_0  
pcre                      8.42                 h439df22_0  
pexpect                   4.6.0                    py36_0  
pickleshare               0.7.5                    py36_0  
pillow                    5.3.0            py36h34e0f95_0  
pip                       10.0.1                   py36_0  
prometheus_client         0.4.2                    py36_0  
prompt_toolkit            2.0.6                    py36_0  
protobuf                  3.6.1                     <pip>
psutil                    5.4.7            py36h14c3975_0  
pthread-stubs             0.3                  h0ce48e5_1  
ptyprocess                0.6.0                    py36_0  
pyasn1                    0.4.4            py36h28b3542_0  
pycparser                 2.19                     py36_0  
pygments                  2.2.0                    py36_0  
pynacl                    1.3.0            py36h7b6447c_0  
pyopenssl                 18.0.0                   py36_0  
pyparsing                 2.2.2                    py36_0  
pyqt                      5.9.2            py36h05f1152_2  
python                    3.6.6                h6e4f718_2  
python-dateutil           2.7.3                    py36_0  
python-engineio           2.3.2                    py36_0  
python-socketio           2.0.0                    py36_0  
pytz                      2018.5                   py36_0  
pywavelets                1.0.1            py36hdd07704_0  
pyyaml                    3.13             py36h14c3975_0  
pyzmq                     17.1.2           py36h14c3975_0  
qt                        5.9.6                h8703b6f_2  
qtconsole                 4.4.2                    py36_0  
readline                  7.0                  h7b6447c_5  
scikit-image              0.14.0           py36hf484d3e_1  
scikit-learn              0.20.0           py36h22eb022_1  
scipy                     1.1.0            py36he2b7bc3_1  
seaborn                   0.9.0                    py36_0  
send2trash                1.5.0                    py36_0  
setuptools                40.4.3                   py36_0  
setuptools                39.1.0                    <pip>
simplegeneric             0.8.1                    py36_2  
sip                       4.19.8           py36hf484d3e_0  
six                       1.11.0                   py36_1  
sortedcontainers          2.0.5                    py36_0  
sqlite                    3.25.2               h7b6447c_0  
statsmodels               0.9.0            py36h035aef0_0  
tblib                     1.3.2                    py36_0  
tensorboard               1.11.0                    <pip>
tensorflow-gpu            1.11.0                    <pip>
termcolor                 1.1.0                     <pip>
terminado                 0.8.1                    py36_1  
testpath                  0.4.2                    py36_0  
tk                        8.6.8                hbc83047_0  
toolz                     0.9.0                    py36_0  
tornado                   5.1.1            py36h7b6447c_0  
tqdm                      4.26.0                    <pip>
traitlets                 4.3.2                    py36_0  
wcwidth                   0.1.7                    py36_0  
webencodings              0.5.1                    py36_1  
werkzeug                  0.14.1                   py36_0  
wheel                     0.32.1                   py36_0  
widgetsnbextension        3.4.2                    py36_0  
xorg-libxau               1.0.8                h470a237_6    conda-forge
xorg-libxdmcp             1.1.2                h470a237_7    conda-forge
xz                        5.2.4                h14c3975_4  
yaml                      0.1.7                had09818_2  
zeromq                    4.2.5                hf484d3e_1  
zict                      0.1.3                    py36_0  
zlib                      1.2.11               ha838bed_2  

1 个答案:

答案 0 :(得分:1)

我认为这只是计算时间的问题-完整的MNIST训练集上有784个特征的60k向量的KNN大约需要9分钟(时间n交叉验证时间m RandomizedSearch组合,除以p处理器),这比大多数其他分类器(要在几秒钟内完成)要慢得多。

我可能应该只使用一个不同的分类器,或者使用PCA或类似方法大大降低我的输入的维数。