我有一个dasch-scheduler在A机上的docker容器中运行
我有dask-workers在机器B(8个CPU)上的docker容器中运行
我收到了“ distributed.client-警告-无法收集1个密钥,重新计划了”错误,并在以下位置发现了问题:https://github.com/dask/distributed/pull/1278
我的理解是我需要同时设置以下两个参数:--contact-address 和:--listen-address
但是,我不知道这些应该是什么。
下面是调度程序和工作程序的输出。
运行工作程序时,我使用了dasch-scheduler的物理地址。
但是,调度程序将打印其他IP地址。我尝试使用这些功能,但似乎无济于事。
使用dask-docker存储库
在终端中,我跑了:
$ docker-compose up
Docker-Scheduler运行在物理IP地址为192.16.3.10的机器上
终端的输出是:
scheduler_1 | distributed.scheduler - INFO - -----------------------------------------------
worker_1 | distributed.nanny - INFO - Start Nanny at: 'tcp://192.18.0.2:42383'
scheduler_1 | distributed.scheduler - INFO - Clear task state
scheduler_1 | distributed.scheduler - INFO - Scheduler at: tcp://192.18.0.4:8786
scheduler_1 | distributed.scheduler - INFO - bokeh at: :8787
scheduler_1 | distributed.scheduler - INFO - Local Directory: /tmp/scheduler-sdf8azmg
scheduler_1 | distributed.scheduler - INFO - -----------------------------------------------
worker_1 | distributed.diskutils - INFO - Found stale lock file and directory '/worker-hs49w9fj', purging
worker_1 | distributed.worker - INFO - Start worker at: tcp://179.18.0.2:42883
worker_1 | distributed.worker - INFO - Listening to: tcp://179.18.0.2:42883
worker_1 | distributed.worker - INFO - nanny at: 179.18.0.2:42383
worker_1 | distributed.worker - INFO - bokeh at: 179.18.0.2:40741
worker_1 | distributed.worker - INFO - Waiting to connect to: tcp://scheduler:8786
worker_1 | distributed.worker - INFO - -------------------------------------------------
worker_1 | distributed.worker - INFO - Threads: 6
worker_1 | distributed.worker - INFO - Memory: 2.10 GB
worker_1 | distributed.worker - INFO - Local Directory: /worker-az7a16hp
worker_1 | distributed.worker - INFO - -------------------------------------------------
scheduler_1 | distributed.scheduler - INFO - Register tcp://192.18.0.2:42883
scheduler_1 | distributed.scheduler - INFO - Starting worker compute stream, tcp://192.18.0.2:42883
worker_1 | distributed.worker - INFO - Registered to: tcp://scheduler:8786
worker_1 | distributed.worker - INFO - -------------------------------------------------
在我物理IP地址为192.16.3.98的工作人员上运行了
$ docker run -it --network host daskdev/dask dask-worker tcp://192.16.3.10:8786 --nprocs 8
终端的输出为:
+ dask-worker tcp://192.16.3.10:8786 --nprocs 8
distributed.nanny - INFO - Start Nanny at: 'tcp://192.16.3.98:43684'
distributed.nanny - INFO - Start Nanny at: 'tcp://192.16.3.98:36592'
distributed.nanny - INFO - Start Nanny at: 'tcp://192.16.3.98:36824'
distributed.nanny - INFO - Start Nanny at: 'tcp://192.16.3.98:45223'
distributed.nanny - INFO - Start Nanny at: 'tcp://192.16.3.98:36275'
distributed.nanny - INFO - Start Nanny at: 'tcp://192.16.3.98:34367'
distributed.nanny - INFO - Start Nanny at: 'tcp://192.16.3.98:33851'
distributed.nanny - INFO - Start Nanny at: 'tcp://192.16.3.98:42186'
distributed.worker - INFO - Start worker at: tcp://192.16.3.98:44199
distributed.worker - INFO - Listening to: tcp://192.16.3.98:44199
distributed.worker - INFO - nanny at: 192.16.3.98:45223
distributed.worker - INFO - bokeh at: 192.16.3.98:38916
distributed.worker - INFO - Waiting to connect to: tcp://192.16.3.10:8786
distributed.worker - INFO - -------------------------------------------------
distributed.worker - INFO - Threads: 1
distributed.worker - INFO - Memory: 4.19 GB
distributed.worker - INFO - Local Directory: /worker-3w1rygoa
distributed.worker - INFO - -------------------------------------------------
以上连接,但出现上述错误。
是否有我应该用于的默认联系人地址 --contact-address和--listen-address?
如果我想在同一台计算机上使用多个工作程序,是否需要运行多个容器?
任何帮助或指导将不胜感激。