从外部docker容器中为--listen-address和--contact-address运行dask调度程序时,您使用哪个address:port?

时间:2019-04-14 03:02:59

标签: docker dask dask-distributed

我有一个dasch-scheduler在A机上的docker容器中运行

我有dask-workers在机器B(8个CPU)上的docker容器中运行

我收到了“ distributed.client-警告-无法收集1个密钥,重新计划了”错误,并在以下位置发现了问题:https://github.com/dask/distributed/pull/1278

我的理解是我需要同时设置以下两个参数:--contact-address 和:--listen-address

但是,我不知道这些应该是什么。

下面是调度程序和工作程序的输出。

运行工作程序时,我使用了dasch-scheduler的物理地址。

但是,调度程序将打印其他IP地址。我尝试使用这些功能,但似乎无济于事。

使用dask-docker存储库

在终端中,我跑了:

$ docker-compose up

Docker-Scheduler运行在物理IP地址为192.16.3.10的机器上

终端的输出是:

scheduler_1  | distributed.scheduler - INFO - -----------------------------------------------
worker_1     | distributed.nanny - INFO -         Start Nanny at: 'tcp://192.18.0.2:42383'
scheduler_1  | distributed.scheduler - INFO - Clear task state
scheduler_1  | distributed.scheduler - INFO -   Scheduler at:     tcp://192.18.0.4:8786
scheduler_1  | distributed.scheduler - INFO -       bokeh at:                     :8787
scheduler_1  | distributed.scheduler - INFO - Local Directory:    /tmp/scheduler-sdf8azmg
scheduler_1  | distributed.scheduler - INFO - -----------------------------------------------
worker_1     | distributed.diskutils - INFO - Found stale lock file and directory '/worker-hs49w9fj', purging
worker_1     | distributed.worker - INFO -       Start worker at:     tcp://179.18.0.2:42883
worker_1     | distributed.worker - INFO -          Listening to:     tcp://179.18.0.2:42883
worker_1     | distributed.worker - INFO -              nanny at:           179.18.0.2:42383
worker_1     | distributed.worker - INFO -              bokeh at:           179.18.0.2:40741
worker_1     | distributed.worker - INFO - Waiting to connect to:       tcp://scheduler:8786
worker_1     | distributed.worker - INFO - -------------------------------------------------
worker_1     | distributed.worker - INFO -               Threads:                          6
worker_1     | distributed.worker - INFO -                Memory:                    2.10 GB
worker_1     | distributed.worker - INFO -       Local Directory:           /worker-az7a16hp
worker_1     | distributed.worker - INFO - -------------------------------------------------
scheduler_1  | distributed.scheduler - INFO - Register tcp://192.18.0.2:42883
scheduler_1  | distributed.scheduler - INFO - Starting worker compute stream, tcp://192.18.0.2:42883
worker_1     | distributed.worker - INFO -         Registered to:       tcp://scheduler:8786
worker_1     | distributed.worker - INFO - -------------------------------------------------

在我物理IP地址为192.16.3.98的工作人员上运行了

$ docker run -it --network host daskdev/dask dask-worker tcp://192.16.3.10:8786 --nprocs 8

终端的输出为:

+ dask-worker tcp://192.16.3.10:8786 --nprocs 8
distributed.nanny - INFO -         Start Nanny at: 'tcp://192.16.3.98:43684'
distributed.nanny - INFO -         Start Nanny at: 'tcp://192.16.3.98:36592'
distributed.nanny - INFO -         Start Nanny at: 'tcp://192.16.3.98:36824'
distributed.nanny - INFO -         Start Nanny at: 'tcp://192.16.3.98:45223'
distributed.nanny - INFO -         Start Nanny at: 'tcp://192.16.3.98:36275'
distributed.nanny - INFO -         Start Nanny at: 'tcp://192.16.3.98:34367'
distributed.nanny - INFO -         Start Nanny at: 'tcp://192.16.3.98:33851'
distributed.nanny - INFO -         Start Nanny at: 'tcp://192.16.3.98:42186'
distributed.worker - INFO -       Start worker at:    tcp://192.16.3.98:44199
distributed.worker - INFO -          Listening to:    tcp://192.16.3.98:44199
distributed.worker - INFO -              nanny at:          192.16.3.98:45223
distributed.worker - INFO -              bokeh at:          192.16.3.98:38916
distributed.worker - INFO - Waiting to connect to:     tcp://192.16.3.10:8786
distributed.worker - INFO - -------------------------------------------------
distributed.worker - INFO -               Threads:                          1
distributed.worker - INFO -                Memory:                    4.19 GB
distributed.worker - INFO -       Local Directory:           /worker-3w1rygoa
distributed.worker - INFO - -------------------------------------------------

以上连接,但出现上述错误。

是否有我应该用于的默认联系人地址 --contact-address和--listen-address?

如果我想在同一台计算机上使用多个工作程序,是否需要运行多个容器?

任何帮助或指导将不胜感激。

0 个答案:

没有答案