为了更好地理解并行,我comparing是different的一组代码。
我在具有12GB可用内存,200GB交换空间和6个物理核心的计算机上运行了测试代码。
这是测试代码的关键部分。
start = time.time()
summed = np_hstack([lin_norm(i) for i in Y])
# without scheduler='processes'
# scheduler='threads'
dis_d = summed.compute()
print('dask delayed {}s'.format(time.time() - start))
我得到了这个错误。
distributed.utils - ERROR - Worker already exists tcp://192.168.1.111:41709
Traceback (most recent call last):
File "/home/singularli/anaconda3/lib/python3.7/site-packages/distributed/utils.py", line 714, in log_errors
yield
File "/home/singularli/anaconda3/lib/python3.7/site-packages/distributed/scheduler.py", line 1426, in add_worker
raise ValueError("Worker already exists %s" % address)
ValueError: Worker already exists tcp://192.168.1.111:41709
distributed.core - ERROR - Worker already exists tcp://192.168.1.111:41709
Traceback (most recent call last):
File "/home/singularli/anaconda3/lib/python3.7/site-packages/distributed/core.py", line 412, in handle_comm
result = yield result
File "/home/singularli/anaconda3/lib/python3.7/site-packages/tornado/gen.py", line 735, in run
value = future.result()
File "/home/singularli/anaconda3/lib/python3.7/site-packages/tornado/gen.py", line 209, in wrapper
yielded = next(result)
File "/home/singularli/anaconda3/lib/python3.7/site-packages/distributed/scheduler.py", line 1426, in add_worker
raise ValueError("Worker already exists %s" % address)
ValueError: Worker already exists tcp://192.168.1.111:41709
更多代码和讨论在Why does Dask perform so slower while multiprocessing perform so much faster?
这是我自己的机器,没有其他人在使用它。没有任何作业或进程正在使用此Worker或端口,则该错误无法说明真正的原因。
对这个错误有任何想法吗?