我正在使用命令sbatch
和下一个配置来排队任务:
#SBATCH --job-name=dask-test
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=10
#SBATCH --mem=80G
#SBATCH --time=00:30:00
#SBATCH --tmp=10G
#SBATCH --partition=normal
#SBATCH --qos=normal
python ./dask-test.py
python脚本大致如下:
import pandas as pd
import dask.dataframe as dd
import numpy as np
from dask.distributed import Client, LocalCluster
print("Generating LocalCluster...")
cluster = LocalCluster()
print("Generating Client...")
client = Client(cluster, processes=False)
print("Scaling client...")
client.scale(8)
data = dd.read_csv(
BASE_DATA_SOURCE + '/Data-BIGDATFILES-*.csv',
delimiter=';',
)
def get_min_dt():
min_dt = data.datetime.min().compute()
print("Min is {}".format())
print("Getting min dt...")
get_min_dt()
第一个问题是“ Generating LocalCluster ...”文本被打印了6次,这让我想知道脚本是否同时运行多次。 其次,经过几分钟不打印任何内容,我收到以下消息:
/anaconda3/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 37396 instead
http_address["port"], self.http_server.port
很多次,最后一次,也很多次:
Task exception was never retrieved
future: <Task finished coro=<_wrap_awaitable() done, defined at /cluster/home/user/anaconda3/lib/python3.7/asyncio/tasks.py:592> exception=RuntimeError('\n An attempt has been made to start a new process before the\n current process has finished its bootstrapping phase.\n\n This probably means that you are not using fork to start your\n child processes and you have forgotten to use the proper idiom\n in the main module:\n\n if __name__ == \'__main__\':\n freeze_support()\n ...\n\n The "freeze_support()" line can be omitted if the program\n is not going to be frozen to produce an executable.')>
Traceback (most recent call last):
File "/cluster/home/user/anaconda3/lib/python3.7/asyncio/tasks.py", line 599, in _wrap_awaitable
return (yield from awaitable.__await__())
File "/cluster/home/user/anaconda3/lib/python3.7/site-packages/distributed/core.py", line 290, in _
await self.start()
File "/cluster/home/user/anaconda3/lib/python3.7/site-packages/distributed/nanny.py", line 295, in start
response = await self.instantiate()
File "/cluster/home/user/anaconda3/lib/python3.7/site-packages/distributed/nanny.py", line 378, in instantiate
result = await self.process.start()
File "/cluster/home/user/anaconda3/lib/python3.7/site-packages/distributed/nanny.py", line 575, in start
await self.process.start()
File "/cluster/home/user/anaconda3/lib/python3.7/site-packages/distributed/process.py", line 34, in _call_and_set_future
res = func(*args, **kwargs)
File "/cluster/home/user/anaconda3/lib/python3.7/site-packages/distributed/process.py", line 202, in _start
process.start()
File "/cluster/home/user/anaconda3/lib/python3.7/multiprocessing/process.py", line 112, in start
self._popen = self._Popen(self)
File "/cluster/home/user/anaconda3/lib/python3.7/multiprocessing/context.py", line 284, in _Popen
return Popen(process_obj)
File "/cluster/home/user/anaconda3/lib/python3.7/multiprocessing/popen_spawn_posix.py", line 32, in __init__
super().__init__(process_obj)
File "/cluster/home/user/anaconda3/lib/python3.7/multiprocessing/popen_fork.py", line 20, in __init__
self._launch(process_obj)
File "/cluster/home/user/anaconda3/lib/python3.7/multiprocessing/popen_spawn_posix.py", line 42, in _launch
prep_data = spawn.get_preparation_data(process_obj._name)
File "/cluster/home/user/anaconda3/lib/python3.7/multiprocessing/spawn.py", line 143, in get_preparation_data
_check_not_importing_main()
File "/cluster/home/user/anaconda3/lib/python3.7/multiprocessing/spawn.py", line 136, in _check_not_importing_main
is not going to be frozen to produce an executable.''')
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
if __name__ == '__main__':
freeze_support()
...
The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce an executable.
我已经尝试添加更多的内核,更多的内存,在processes=False
的实例化处设置Client
以及许多其他事情,但是我无法弄清楚是什么问题。
使用的库/软件版本为:
我设置错了吗?使用本地集群和客户端结构的方法是否正确?
答案 0 :(得分:1)
经过研究,我可以找到解决方案。不太清楚原因,但非常确定。
LocalCluster,Client及其之后的所有代码(将被分发执行的代码)的实例化不得位于Python脚本的模块级别。相反,此代码必须在方法中或__main__块中,如下所示:
import pandas as pd
import dask.dataframe as dd
import numpy as np
from dask.distributed import Client, LocalCluster
if __name__ == "__main__":
print("Generating LocalCluster...")
cluster = LocalCluster()
print("Generating Client...")
client = Client(cluster, processes=False)
print("Scaling client...")
client.scale(8)
data = dd.read_csv(
BASE_DATA_SOURCE + '/Data-BIGDATFILES-*.csv',
delimiter=';',
)
def get_min_dt():
min_dt = data.datetime.min().compute()
print("Min is {}".format())
print("Getting min dt...")
get_min_dt()
这个简单的变化有所作为。在以下问题线程中找到了解决方案:https://github.com/dask/distributed/issues/2520#issuecomment-470817810