我目前正在尝试使sklearn的随机林在SLURM集群上并行运行。我已经将它们发送到节点,然后我注意到参数n_jobs = -1不再适用于SLURM。
我尝试过ipyparallel软件包,但它给了我错误消息。我并不一定要使用ipyparallel,所以我对可以在集群上并行化随机森林的任何模块都很感激。
from sklearn.ensemble import RandomForestRegressor
from joblib import parallel_backend, register_parallel_backend
from ipyparallel import Client
from ipyparallel.joblib import IPythonParallelBackend
import sys
import time
import pickle
import numpy as np
def fit_predict(self, X_train, y, X_test):
"""
train a model by X_train and y, and then return the prediction of
X_test
"""
pred = None
client = Client(profile='myprofile')
bview = client.load_balanced_view()
register_parallel_backend('ipyparallel', lambda: IPythonParallelBackend(view=bview))
regr = RandomForestRegressor(n_jobs=-1)
try:
with parallel_backend('ipyparallel'):
regr.fit(X_train, y)
pred = regr.predict(X_test)
except Exception as e:
print(e)
return pred
错误:
Traceback (most recent call last):
File "job.py", line 124, in <module>
pred = rf.fit_predict(X_train, y_train, X_test)
File "job.py", line 50, in fit_predict
client = Client(profile='myprofile')
File "/home/lfz/.conda/envs/mvi/lib/python3.7/site-packages/ipyparallel/client/client.py", line 419, in __init__
raise IOError(no_file_msg)
OSError: You have attempted to connect to an IPython Cluster but no Controller could be found.
Please double-check your configuration and ensure that a cluster is running.
srun: error: c6-28: task 0: Exited with exit code 1