尝试在当前进程完成其引导阶段之前启动一个新进程

时间:2019-03-08 06:38:30

标签: python dask dask-distributed

我对dask并不陌生,我发现拥有一个可以轻松实现并行化的模块真是太好了。我正在一个项目中,我能够在一台机器上并行化一个 you can see here 循环。但是,我想转到dask.distributed。我对上面的类进行了以下更改:

diff --git a/mlchem/fingerprints/gaussian.py b/mlchem/fingerprints/gaussian.py
index ce6a72b..89f8638 100644
--- a/mlchem/fingerprints/gaussian.py
+++ b/mlchem/fingerprints/gaussian.py
@@ -6,7 +6,7 @@ from sklearn.externals import joblib
 from .cutoff import Cosine
 from collections import OrderedDict
 import dask
-import dask.multiprocessing
+from dask.distributed import Client
 import time


@@ -141,13 +141,14 @@ class Gaussian(object):
         for image in images.items():
             computations.append(self.fingerprints_per_image(image))

+        client = Client()
         if self.scaler is None:
-            feature_space = dask.compute(*computations, scheduler='processes',
+            feature_space = dask.compute(*computations, scheduler='distributed',
                                          num_workers=self.cores)
             feature_space = OrderedDict(feature_space)
         else:
             stacked_features = dask.compute(*computations,
-                                            scheduler='processes',
+                                            scheduler='distributed',
                                             num_workers=self.cores)

             stacked_features = numpy.array(stacked_features)

这样做会产生此错误:

 File "/usr/local/Cellar/python/3.7.2_2/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/spawn.py", line 136, in _check_not_importing_main
    is not going to be frozen to produce an executable.''')
RuntimeError:
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

我尝试了添加if __name__ == '__main__':的不同方法,但均未成功。这可以是reproduced by running this example。如果有人可以帮助我解决此问题,我将不胜感激。我不知道如何更改代码以使其正常工作。

谢谢。

编辑:示例为cu_training.py

1 个答案:

答案 0 :(得分:1)

Client命令启动了新进程,因此它必须位于SO questionGitHub issue

中所述的if __name__ == '__main__':块内

这与多处理模块相同