如何在SLURM上并行化sklearn的随机森林回归器

时间:2019-08-19 07:49:31

标签: machine-learning scikit-learn parallel-processing random-forest slurm

我目前正在尝试使sklearn的随机林在SLURM集群上并行运行。我已经将它们发送到节点,然后我注意到参数n_jobs = -1不再适用于SLURM。

我尝试过ipyparallel软件包,但它给了我错误消息。我并不一定要使用ipyparallel,所以我对可以在集群上并行化随机森林的任何模块都很感激。

from sklearn.ensemble import RandomForestRegressor
from joblib import parallel_backend, register_parallel_backend
from ipyparallel import Client 
from ipyparallel.joblib import IPythonParallelBackend
import sys
import time
import pickle
import numpy as np 

def fit_predict(self, X_train, y, X_test):
    """ 
    train a model by X_train and y, and then return the prediction of
     X_test 
    """
    pred = None
    client = Client(profile='myprofile')
    bview = client.load_balanced_view()
    register_parallel_backend('ipyparallel', lambda: IPythonParallelBackend(view=bview))
    regr = RandomForestRegressor(n_jobs=-1)
    try:
        with parallel_backend('ipyparallel'):
            regr.fit(X_train, y)
        pred = regr.predict(X_test)
    except Exception as e:
        print(e)

    return pred

错误:

Traceback (most recent call last):
  File "job.py", line 124, in <module>
    pred = rf.fit_predict(X_train, y_train, X_test)
  File "job.py", line 50, in fit_predict
    client = Client(profile='myprofile')
  File "/home/lfz/.conda/envs/mvi/lib/python3.7/site-packages/ipyparallel/client/client.py", line 419, in __init__
    raise IOError(no_file_msg)
OSError: You have attempted to connect to an IPython Cluster but no Controller could be found.
Please double-check your configuration and ensure that a cluster is running.
srun: error: c6-28: task 0: Exited with exit code 1

0 个答案:

没有答案