工人连接,但计算失败

时间:2017-08-25 13:03:17

标签: dask-distributed

我让dask-worker连接到dask-scheduler。发出任务后发生问题。在我看来(在任务流中)工作人员执行计算。来自dask worker的错误日志很长,而且我没有得到它 - 它说超时,连接被拒绝了?这被拒绝了哪个联系? AFAIK两台机器之间没有防火墙(在局域网上)。

请注意,一遍又一遍地发生相同/类似的错误。最终,计算失败,说明“ValueError:找不到依赖数组 - 原始-0effb3cc096e32a82e95557c88b795fd。检查工作日志”

distributed.nanny - INFO -         Start Nanny at: 'tcp://10.0.0.42:36199'
distributed.worker - INFO -       Start worker at:      tcp://10.0.0.42:44304
distributed.worker - INFO -              bokeh at:            10.0.0.42:8789
distributed.worker - INFO -               http at:            10.0.0.42:40349
distributed.worker - INFO -              nanny at:            10.0.0.42:36199
distributed.worker - INFO - Waiting to connect to:       tcp://10.0.0.50:8786
distributed.worker - INFO - -------------------------------------------------
distributed.worker - INFO -               Threads:                         40
distributed.worker - INFO -                Memory:                  121.64 GB
distributed.worker - INFO -       Local Directory:            worker-qdz2_s09
distributed.worker - INFO - -------------------------------------------------
distributed.worker - INFO -         Registered to:             tcp://10.0.0.50:8786
distributed.worker - INFO - -------------------------------------------------
distributed.worker - ERROR - Worker stream died during communication: tcp://127.0.0.1:34876
Traceback (most recent call last):
  File "/home/paul/anaconda3/envs/ecopy/lib/python3.5/site-packages/distributed/comm/core.py", line 185, in connect
    quiet_exceptions=EnvironmentError)
  File "/home/paul/anaconda3/envs/ecopy/lib/python3.5/site-packages/tornado/gen.py", line 1015, in run
    value = future.result()
  File "/home/paul/anaconda3/envs/ecopy/lib/python3.5/site-packages/tornado/concurrent.py", line 237, in result
    raise_exc_info(self._exc_info)
  File "<string>", line 3, in raise_exc_info
tornado.gen.TimeoutError: Timeout

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/paul/anaconda3/envs/ecopy/lib/python3.5/site-packages/distributed/worker.py", line 1617, in gather_dep
    who=self.address)
  File "/home/paul/anaconda3/envs/ecopy/lib/python3.5/site-packages/tornado/gen.py", line 1015, in run
    value = future.result()
  File "/home/paul/anaconda3/envs/ecopy/lib/python3.5/site-packages/tornado/concurrent.py", line 237, in result
    raise_exc_info(self._exc_info)
  File "<string>", line 3, in raise_exc_info
  File "/home/paul/anaconda3/envs/ecopy/lib/python3.5/site-packages/tornado/gen.py", line 1021, in run
    yielded = self.gen.throw(*exc_info)
  File "/home/paul/anaconda3/envs/ecopy/lib/python3.5/site-packages/distributed/core.py", line 479, in send_recv_from_rpc
    comm = yield self.pool.connect(self.addr)
  File "/home/paul/anaconda3/envs/ecopy/lib/python3.5/site-packages/tornado/gen.py", line 1015, in run
    value = future.result()
  File "/home/paul/anaconda3/envs/ecopy/lib/python3.5/site-packages/tornado/concurrent.py", line 237, in result
    raise_exc_info(self._exc_info)
  File "<string>", line 3, in raise_exc_info
  File "/home/paul/anaconda3/envs/ecopy/lib/python3.5/site-packages/tornado/gen.py", line 1021, in run
    yielded = self.gen.throw(*exc_info)
  File "/home/paul/anaconda3/envs/ecopy/lib/python3.5/site-packages/distributed/core.py", line 583, in connect
    connection_args=self.connection_args)
  File "/home/paul/anaconda3/envs/ecopy/lib/python3.5/site-packages/tornado/gen.py", line 1015, in run
    value = future.result()
  File "/home/paul/anaconda3/envs/ecopy/lib/python3.5/site-packages/tornado/concurrent.py", line 237, in result
    raise_exc_info(self._exc_info)
  File "<string>", line 3, in raise_exc_info
  File "/home/paul/anaconda3/envs/ecopy/lib/python3.5/site-packages/tornado/gen.py", line 1021, in run
    yielded = self.gen.throw(*exc_info)
  File "/home/paul/anaconda3/envs/ecopy/lib/python3.5/site-packages/distributed/comm/core.py", line 194, in connect
    _raise(error)
  File "/home/paul/anaconda3/envs/ecopy/lib/python3.5/site-packages/distributed/comm/core.py", line 177, in _raise
    raise IOError(msg)
OSError: Timed out trying to connect to 'tcp://127.0.0.1:34876' after 3.0 s: in <distributed.comm.tcp.TCPConnector object at 0x7fcbfc5e6f98>: ConnectionRefusedError: [Errno 111] Connection refused
distributed.worker - INFO - Can't find dependencies for key ('array-concatenate-39749c96029f622599cd35ec80ca507c', 297, 0, 0)
distributed.worker - INFO - Dependent not found: array-original-7a8cba4415f43af718833379b651ccb6 0 .  Asking scheduler
distributed.worker - INFO - Dependent not found: array-original-0effb3cc096e32a82e95557c88b795fd 0 .  Asking scheduler
distributed.worker - INFO - Can't find dependencies for key ('array-concatenate-39749c96029f622599cd35ec80ca507c', 263, 0, 0)
distributed.worker - INFO - Can't find dependencies for key ('array-concatenate-39749c96029f622599cd35ec80ca507c', 292, 0, 0)
distributed.worker - INFO - Can't find dependencies for key ('array-concatenate-39749c96029f622599cd35ec80ca507c', 256, 0, 0)
distributed.worker - INFO - Can't find dependencies for key ('array-concatenate-39749c96029f622599cd35ec80ca507c', 278, 0, 0)
distributed.worker - INFO - Can't find dependencies for key ('array-concatenate-39749c96029f622599cd35ec80ca507c', 284, 0, 0)
distributed.worker - INFO - Can't find dependencies for key ('array-concatenate-39749c96029f622599cd35ec80ca507c', 275, 0, 0)
distributed.worker - INFO - Can't find dependencies for key ('array-concatenate-39749c96029f622599cd35ec80ca507c', 285, 0, 0)
distributed.worker - INFO - Can't find dependencies for key ('array-concatenate-39749c96029f622599cd35ec80ca507c', 301, 0, 0)
distributed.worker - INFO - Can't find dependencies for key ('array-concatenate-39749c96029f622599cd35ec80ca507c', 295, 0, 0)
distributed.worker - INFO - Can't find dependencies for key ('array-concatenate-39749c96029f622599cd35ec80ca507c', 303, 0, 0)
distributed.worker - INFO - Can't find dependencies for key ('array-concatenate-39749c96029f622599cd35ec80ca507c', 271, 0, 0)
distributed.worker - INFO - Can't find dependencies for key ('array-concatenate-39749c96029f622599cd35ec80ca507c', 281, 0, 0)
distributed.worker - INFO - Can't find dependencies for key ('array-concatenate-39749c96029f622599cd35ec80ca507c', 287, 0, 0)
distributed.worker - INFO - Can't find dependencies for key ('array-concatenate-39749c96029f622599cd35ec80ca507c', 305, 0, 0)
distributed.worker - INFO - Can't find dependencies for key ('array-concatenate-39749c96029f622599cd35ec80ca507c', 282, 0, 0)
distributed.worker - INFO - Can't find dependencies for key ('array-concatenate-39749c96029f622599cd35ec80ca507c', 173, 0, 0)
distributed.worker - INFO - Can't find dependencies for key ('array-concatenate-39749c96029f622599cd35ec80ca507c', 178, 0, 0)
distributed.worker - INFO - Can't find dependencies for key ('array-concatenate-39749c96029f622599cd35ec80ca507c', 190, 0, 0)
distributed.worker - INFO - Can't find dependencies for key ('array-concatenate-39749c96029f622599cd35ec80ca507c', 185, 0, 0)
distributed.worker - INFO - Can't find dependencies for key ('array-concatenate-39749c96029f622599cd35ec80ca507c', 195, 0, 0)
distributed.worker - INFO - Can't find dependencies for key ('array-concatenate-39749c96029f622599cd35ec80ca507c', 194, 0, 0)
distributed.worker - INFO - Can't find dependencies for key ('array-concatenate-39749c96029f622599cd35ec80ca507c', 177, 0, 0)
distributed.worker - ERROR - Worker stream died during communication: tcp://127.0.0.1:34876
Traceback (most recent call last):
  File "/home/paul/anaconda3/envs/ecopy/lib/python3.5/site-packages/distributed/comm/core.py", line 185, in connect
    quiet_exceptions=EnvironmentError)
  File "/home/paul/anaconda3/envs/ecopy/lib/python3.5/site-packages/tornado/gen.py", line 1015, in run
    value = future.result()
  File "/home/paul/anaconda3/envs/ecopy/lib/python3.5/site-packages/tornado/concurrent.py", line 237, in result
    raise_exc_info(self._exc_info)
  File "<string>", line 3, in raise_exc_info
tornado.gen.TimeoutError: Timeout

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/paul/anaconda3/envs/ecopy/lib/python3.5/site-packages/distributed/worker.py", line 1617, in gather_dep
    who=self.address)
  File "/home/paul/anaconda3/envs/ecopy/lib/python3.5/site-packages/tornado/gen.py", line 1015, in run
    value = future.result()
  File "/home/paul/anaconda3/envs/ecopy/lib/python3.5/site-packages/tornado/concurrent.py", line 237, in result
    raise_exc_info(self._exc_info)
  File "<string>", line 3, in raise_exc_info
  File "/home/paul/anaconda3/envs/ecopy/lib/python3.5/site-packages/tornado/gen.py", line 1021, in run
    yielded = self.gen.throw(*exc_info)
  File "/home/paul/anaconda3/envs/ecopy/lib/python3.5/site-packages/distributed/core.py", line 479, in send_recv_from_rpc
    comm = yield self.pool.connect(self.addr)
  File "/home/paul/anaconda3/envs/ecopy/lib/python3.5/site-packages/tornado/gen.py", line 1015, in run
    value = future.result()
  File "/home/paul/anaconda3/envs/ecopy/lib/python3.5/site-packages/tornado/concurrent.py", line 237, in result
    raise_exc_info(self._exc_info)
  File "<string>", line 3, in raise_exc_info
  File "/home/paul/anaconda3/envs/ecopy/lib/python3.5/site-packages/tornado/gen.py", line 1021, in run
    yielded = self.gen.throw(*exc_info)
  File "/home/paul/anaconda3/envs/ecopy/lib/python3.5/site-packages/distributed/core.py", line 583, in connect
    connection_args=self.connection_args)
  File "/home/paul/anaconda3/envs/ecopy/lib/python3.5/site-packages/tornado/gen.py", line 1015, in run
    value = future.result()
  File "/home/paul/anaconda3/envs/ecopy/lib/python3.5/site-packages/tornado/concurrent.py", line 237, in result
    raise_exc_info(self._exc_info)
  File "<string>", line 3, in raise_exc_info
  File "/home/paul/anaconda3/envs/ecopy/lib/python3.5/site-packages/tornado/gen.py", line 1021, in run
    yielded = self.gen.throw(*exc_info)
  File "/home/paul/anaconda3/envs/ecopy/lib/python3.5/site-packages/distributed/comm/core.py", line 194, in connect
    _raise(error)
  File "/home/paul/anaconda3/envs/ecopy/lib/python3.5/site-packages/distributed/comm/core.py", line 177, in _raise
    raise IOError(msg)
OSError: Timed out trying to connect to 'tcp://127.0.0.1:34876' after 3.0 s: in <distributed.comm.tcp.TCPConnector object at 0x7fcbfc50b4a8>: ConnectionRefusedError: [Errno 111] Connection refused

0 个答案:

没有答案