我正在尝试在MNIST上训练和交叉验证sklearn的KNeighborsClassifier。如果有n_jobs=None
,我就可以拟合模型,但是只要将模型放在并行环境中(例如RandomizedSearchCV
或cross_val_score
和n_jobs=-1
),就没有进度制造。我已经在Jupyter笔记本和常规Python 3.6会话中进行了尝试。 RandomForestClassifier
加上RandomizedSearchCV
的效果与预期一样。
有时候,我可以使用ctrl+C
或通过中断内核来中断该过程,而有时候我无法做到这一点,需要实际SIGKILL
进行该过程。
以下代码有效:
knn_clf = KNeighborsClassifier()
knn_clf.fit(X, y)
但是即使n_jobs
尚未设置,它也将挂起并且永远不会完成:
cross_val_score(knn_clf, X, y,
cv=3,
scoring="accuracy",
verbose=3
)
我正在Ubuntu 18.04.1上使用以下conda环境运行Anaconda Python 3.6.6:
# packages in environment at /home/user/anaconda3/envs/ml:
#
# Name Version Build Channel
absl-py 0.5.0 <pip>
asn1crypto 0.24.0 py36_0
astor 0.7.1 <pip>
backcall 0.1.0 py36_0
backports 1.0 py36_1
backports.functools_lru_cache 1.5 py_1 conda-forge
bcrypt 3.1.4 py36h14c3975_0
blas 1.0 openblas
bleach 3.0.2 py36_0
bokeh 0.13.0 py36_0
bzip2 1.0.6 h14c3975_5
ca-certificates 2018.03.07 0
certifi 2018.10.15 py36_0
cffi 1.11.5 py36he75722e_1
click 7.0 py36_0
cloudpickle 0.6.1 py36_0
cryptography 2.3.1 py36hc365091_0
cryptography-vectors 2.3.1 py36_0
cycler 0.10.0 py36_0
cytoolz 0.9.0.1 py36h14c3975_1
dask 0.19.4 py36_0
dask-core 0.19.4 py36_0
dask-glm 0.1.0 py36_0
dask-ml 0.10.0 py36_0
dbus 1.13.2 h714fa37_1
decorator 4.3.0 py36_0
distributed 1.23.3 py36_0
entrypoints 0.2.3 py36_2
eventlet 0.23.0 py36_0 conda-forge
expat 2.2.6 he6710b0_0
ffmpeg 4.0 hcdf2ecd_0
flask 1.0.2 py36_1
flask-socketio 3.0.2 py36_0
fontconfig 2.13.0 h9420a91_0
freetype 2.9.1 h8a8886c_1
futures-compat 1.0 py3_0
gast 0.2.0 <pip>
gettext 0.19.8.1 hd7bead4_3
glib 2.56.2 hd408876_0
gmp 6.1.2 h6c8ec71_1
greenlet 0.4.15 py36h7b6447c_0
grpcio 1.15.0 <pip>
gst-plugins-base 1.14.0 hbbd80ab_1
gstreamer 1.14.0 hb453b48_1
h5py 2.8.0 py36h989c5e5_3
hdf5 1.10.2 hba1933b_1
heapdict 1.0.0 py36_2
html5lib 1.0.1 py36_0
icu 58.2 h9c2bf20_1
idna 2.7 py36_0
imageio 2.4.1 py36_0
intel-openmp 2019.0 118
ipykernel 5.1.0 py36h39e3cac_0
ipython 7.0.1 py36h39e3cac_0
ipython_genutils 0.2.0 py36_0
ipywidgets 7.4.2 py36_0
itsdangerous 0.24 py36_1
jedi 0.13.1 py36_0
jinja2 2.10 py36_0
joblib 0.12.5 py36_0
jpeg 9b h024ee3a_2
jsonschema 2.6.0 py36_0
jupyter 1.0.0 py36_7
jupyter_client 5.2.3 py36_0
jupyter_console 6.0.0 py36_0
jupyter_core 4.4.0 py36_0
Keras 2.2.2 <pip>
Keras-Applications 1.0.5 <pip>
Keras-Preprocessing 1.0.3 <pip>
kiwisolver 1.0.1 py36hf484d3e_0
libedit 3.1.20170329 h6b74fdf_2
libffi 3.2.1 hd88cf55_4
libgcc-ng 8.2.0 hdf63c60_1
libgfortran 3.0.0 1 conda-forge
libgfortran-ng 7.3.0 hdf63c60_0
libiconv 1.15 h63c8f33_5
libopenblas 0.3.3 h5a2b251_3
libopus 1.2.1 hb9ed12e_0
libpng 1.6.35 hbc83047_0
libsodium 1.0.16 h1bed415_0
libstdcxx-ng 8.2.0 hdf63c60_1
libtiff 4.0.9 he85c1e1_2
libuuid 1.0.3 h1bed415_2
libvpx 1.7.0 h439df22_0
libxcb 1.13 h1bed415_1
libxml2 2.9.8 h26e45fe_1
llvmlite 0.25.0 py36hd408876_0
locket 0.2.0 py36_1
Markdown 3.0.1 <pip>
markupsafe 1.0 py36h14c3975_1
matplotlib 3.0.0 py36h5429711_0
mistune 0.8.4 py36h7b6447c_0
mkl 2019.0 118
mkl_fft 1.0.1 py36h3010b51_0
mkl_random 1.0.1 py36h629b387_0
moviepy 0.2.3.5 <pip>
msgpack-python 0.5.6 py36h6bb024c_1
multipledispatch 0.6.0 py36_0
nbconvert 5.3.1 py36_0
nbformat 4.4.0 py36_0
ncurses 6.1 hf484d3e_0
networkx 2.2 py36_1
nomkl 3.0 0
notebook 5.7.0 py36_0
numba 0.40.0 py36h962f231_0
numpy 1.15.2 py36h99e49ec_1
numpy-base 1.15.2 py36h2f8d375_1
olefile 0.46 py36_0
openblas 0.3.3 3
openblas-devel 0.3.3 3
opencv3 3.1.0 py36_0 menpo
openssl 1.0.2p h14c3975_0
packaging 18.0 py36_0
pandas 0.23.4 py36h04863e7_0
pandoc 2.2.3.2 0
pandocfilters 1.4.2 py36_1
paramiko 2.4.2 py36_0
parso 0.3.1 py36_0
partd 0.3.9 py36_0
patsy 0.5.0 py36_0
pcre 8.42 h439df22_0
pexpect 4.6.0 py36_0
pickleshare 0.7.5 py36_0
pillow 5.3.0 py36h34e0f95_0
pip 10.0.1 py36_0
prometheus_client 0.4.2 py36_0
prompt_toolkit 2.0.6 py36_0
protobuf 3.6.1 <pip>
psutil 5.4.7 py36h14c3975_0
pthread-stubs 0.3 h0ce48e5_1
ptyprocess 0.6.0 py36_0
pyasn1 0.4.4 py36h28b3542_0
pycparser 2.19 py36_0
pygments 2.2.0 py36_0
pynacl 1.3.0 py36h7b6447c_0
pyopenssl 18.0.0 py36_0
pyparsing 2.2.2 py36_0
pyqt 5.9.2 py36h05f1152_2
python 3.6.6 h6e4f718_2
python-dateutil 2.7.3 py36_0
python-engineio 2.3.2 py36_0
python-socketio 2.0.0 py36_0
pytz 2018.5 py36_0
pywavelets 1.0.1 py36hdd07704_0
pyyaml 3.13 py36h14c3975_0
pyzmq 17.1.2 py36h14c3975_0
qt 5.9.6 h8703b6f_2
qtconsole 4.4.2 py36_0
readline 7.0 h7b6447c_5
scikit-image 0.14.0 py36hf484d3e_1
scikit-learn 0.20.0 py36h22eb022_1
scipy 1.1.0 py36he2b7bc3_1
seaborn 0.9.0 py36_0
send2trash 1.5.0 py36_0
setuptools 40.4.3 py36_0
setuptools 39.1.0 <pip>
simplegeneric 0.8.1 py36_2
sip 4.19.8 py36hf484d3e_0
six 1.11.0 py36_1
sortedcontainers 2.0.5 py36_0
sqlite 3.25.2 h7b6447c_0
statsmodels 0.9.0 py36h035aef0_0
tblib 1.3.2 py36_0
tensorboard 1.11.0 <pip>
tensorflow-gpu 1.11.0 <pip>
termcolor 1.1.0 <pip>
terminado 0.8.1 py36_1
testpath 0.4.2 py36_0
tk 8.6.8 hbc83047_0
toolz 0.9.0 py36_0
tornado 5.1.1 py36h7b6447c_0
tqdm 4.26.0 <pip>
traitlets 4.3.2 py36_0
wcwidth 0.1.7 py36_0
webencodings 0.5.1 py36_1
werkzeug 0.14.1 py36_0
wheel 0.32.1 py36_0
widgetsnbextension 3.4.2 py36_0
xorg-libxau 1.0.8 h470a237_6 conda-forge
xorg-libxdmcp 1.1.2 h470a237_7 conda-forge
xz 5.2.4 h14c3975_4
yaml 0.1.7 had09818_2
zeromq 4.2.5 hf484d3e_1
zict 0.1.3 py36_0
zlib 1.2.11 ha838bed_2
答案 0 :(得分:1)
我认为这只是计算时间的问题-完整的MNIST训练集上有784个特征的60k向量的KNN大约需要9分钟(时间n
交叉验证时间m
RandomizedSearch组合,除以p
处理器),这比大多数其他分类器(要在几秒钟内完成)要慢得多。
我可能应该只使用一个不同的分类器,或者使用PCA或类似方法大大降低我的输入的维数。