我让dask-worker连接到dask-scheduler。发出任务后发生问题。在我看来(在任务流中)工作人员执行计算。来自dask worker的错误日志很长,而且我没有得到它 - 它说超时,连接被拒绝了?这被拒绝了哪个联系? AFAIK两台机器之间没有防火墙(在局域网上)。
请注意,一遍又一遍地发生相同/类似的错误。最终,计算失败,说明“ValueError:找不到依赖数组 - 原始-0effb3cc096e32a82e95557c88b795fd。检查工作日志”
distributed.nanny - INFO - Start Nanny at: 'tcp://10.0.0.42:36199'
distributed.worker - INFO - Start worker at: tcp://10.0.0.42:44304
distributed.worker - INFO - bokeh at: 10.0.0.42:8789
distributed.worker - INFO - http at: 10.0.0.42:40349
distributed.worker - INFO - nanny at: 10.0.0.42:36199
distributed.worker - INFO - Waiting to connect to: tcp://10.0.0.50:8786
distributed.worker - INFO - -------------------------------------------------
distributed.worker - INFO - Threads: 40
distributed.worker - INFO - Memory: 121.64 GB
distributed.worker - INFO - Local Directory: worker-qdz2_s09
distributed.worker - INFO - -------------------------------------------------
distributed.worker - INFO - Registered to: tcp://10.0.0.50:8786
distributed.worker - INFO - -------------------------------------------------
distributed.worker - ERROR - Worker stream died during communication: tcp://127.0.0.1:34876
Traceback (most recent call last):
File "/home/paul/anaconda3/envs/ecopy/lib/python3.5/site-packages/distributed/comm/core.py", line 185, in connect
quiet_exceptions=EnvironmentError)
File "/home/paul/anaconda3/envs/ecopy/lib/python3.5/site-packages/tornado/gen.py", line 1015, in run
value = future.result()
File "/home/paul/anaconda3/envs/ecopy/lib/python3.5/site-packages/tornado/concurrent.py", line 237, in result
raise_exc_info(self._exc_info)
File "<string>", line 3, in raise_exc_info
tornado.gen.TimeoutError: Timeout
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/paul/anaconda3/envs/ecopy/lib/python3.5/site-packages/distributed/worker.py", line 1617, in gather_dep
who=self.address)
File "/home/paul/anaconda3/envs/ecopy/lib/python3.5/site-packages/tornado/gen.py", line 1015, in run
value = future.result()
File "/home/paul/anaconda3/envs/ecopy/lib/python3.5/site-packages/tornado/concurrent.py", line 237, in result
raise_exc_info(self._exc_info)
File "<string>", line 3, in raise_exc_info
File "/home/paul/anaconda3/envs/ecopy/lib/python3.5/site-packages/tornado/gen.py", line 1021, in run
yielded = self.gen.throw(*exc_info)
File "/home/paul/anaconda3/envs/ecopy/lib/python3.5/site-packages/distributed/core.py", line 479, in send_recv_from_rpc
comm = yield self.pool.connect(self.addr)
File "/home/paul/anaconda3/envs/ecopy/lib/python3.5/site-packages/tornado/gen.py", line 1015, in run
value = future.result()
File "/home/paul/anaconda3/envs/ecopy/lib/python3.5/site-packages/tornado/concurrent.py", line 237, in result
raise_exc_info(self._exc_info)
File "<string>", line 3, in raise_exc_info
File "/home/paul/anaconda3/envs/ecopy/lib/python3.5/site-packages/tornado/gen.py", line 1021, in run
yielded = self.gen.throw(*exc_info)
File "/home/paul/anaconda3/envs/ecopy/lib/python3.5/site-packages/distributed/core.py", line 583, in connect
connection_args=self.connection_args)
File "/home/paul/anaconda3/envs/ecopy/lib/python3.5/site-packages/tornado/gen.py", line 1015, in run
value = future.result()
File "/home/paul/anaconda3/envs/ecopy/lib/python3.5/site-packages/tornado/concurrent.py", line 237, in result
raise_exc_info(self._exc_info)
File "<string>", line 3, in raise_exc_info
File "/home/paul/anaconda3/envs/ecopy/lib/python3.5/site-packages/tornado/gen.py", line 1021, in run
yielded = self.gen.throw(*exc_info)
File "/home/paul/anaconda3/envs/ecopy/lib/python3.5/site-packages/distributed/comm/core.py", line 194, in connect
_raise(error)
File "/home/paul/anaconda3/envs/ecopy/lib/python3.5/site-packages/distributed/comm/core.py", line 177, in _raise
raise IOError(msg)
OSError: Timed out trying to connect to 'tcp://127.0.0.1:34876' after 3.0 s: in <distributed.comm.tcp.TCPConnector object at 0x7fcbfc5e6f98>: ConnectionRefusedError: [Errno 111] Connection refused
distributed.worker - INFO - Can't find dependencies for key ('array-concatenate-39749c96029f622599cd35ec80ca507c', 297, 0, 0)
distributed.worker - INFO - Dependent not found: array-original-7a8cba4415f43af718833379b651ccb6 0 . Asking scheduler
distributed.worker - INFO - Dependent not found: array-original-0effb3cc096e32a82e95557c88b795fd 0 . Asking scheduler
distributed.worker - INFO - Can't find dependencies for key ('array-concatenate-39749c96029f622599cd35ec80ca507c', 263, 0, 0)
distributed.worker - INFO - Can't find dependencies for key ('array-concatenate-39749c96029f622599cd35ec80ca507c', 292, 0, 0)
distributed.worker - INFO - Can't find dependencies for key ('array-concatenate-39749c96029f622599cd35ec80ca507c', 256, 0, 0)
distributed.worker - INFO - Can't find dependencies for key ('array-concatenate-39749c96029f622599cd35ec80ca507c', 278, 0, 0)
distributed.worker - INFO - Can't find dependencies for key ('array-concatenate-39749c96029f622599cd35ec80ca507c', 284, 0, 0)
distributed.worker - INFO - Can't find dependencies for key ('array-concatenate-39749c96029f622599cd35ec80ca507c', 275, 0, 0)
distributed.worker - INFO - Can't find dependencies for key ('array-concatenate-39749c96029f622599cd35ec80ca507c', 285, 0, 0)
distributed.worker - INFO - Can't find dependencies for key ('array-concatenate-39749c96029f622599cd35ec80ca507c', 301, 0, 0)
distributed.worker - INFO - Can't find dependencies for key ('array-concatenate-39749c96029f622599cd35ec80ca507c', 295, 0, 0)
distributed.worker - INFO - Can't find dependencies for key ('array-concatenate-39749c96029f622599cd35ec80ca507c', 303, 0, 0)
distributed.worker - INFO - Can't find dependencies for key ('array-concatenate-39749c96029f622599cd35ec80ca507c', 271, 0, 0)
distributed.worker - INFO - Can't find dependencies for key ('array-concatenate-39749c96029f622599cd35ec80ca507c', 281, 0, 0)
distributed.worker - INFO - Can't find dependencies for key ('array-concatenate-39749c96029f622599cd35ec80ca507c', 287, 0, 0)
distributed.worker - INFO - Can't find dependencies for key ('array-concatenate-39749c96029f622599cd35ec80ca507c', 305, 0, 0)
distributed.worker - INFO - Can't find dependencies for key ('array-concatenate-39749c96029f622599cd35ec80ca507c', 282, 0, 0)
distributed.worker - INFO - Can't find dependencies for key ('array-concatenate-39749c96029f622599cd35ec80ca507c', 173, 0, 0)
distributed.worker - INFO - Can't find dependencies for key ('array-concatenate-39749c96029f622599cd35ec80ca507c', 178, 0, 0)
distributed.worker - INFO - Can't find dependencies for key ('array-concatenate-39749c96029f622599cd35ec80ca507c', 190, 0, 0)
distributed.worker - INFO - Can't find dependencies for key ('array-concatenate-39749c96029f622599cd35ec80ca507c', 185, 0, 0)
distributed.worker - INFO - Can't find dependencies for key ('array-concatenate-39749c96029f622599cd35ec80ca507c', 195, 0, 0)
distributed.worker - INFO - Can't find dependencies for key ('array-concatenate-39749c96029f622599cd35ec80ca507c', 194, 0, 0)
distributed.worker - INFO - Can't find dependencies for key ('array-concatenate-39749c96029f622599cd35ec80ca507c', 177, 0, 0)
distributed.worker - ERROR - Worker stream died during communication: tcp://127.0.0.1:34876
Traceback (most recent call last):
File "/home/paul/anaconda3/envs/ecopy/lib/python3.5/site-packages/distributed/comm/core.py", line 185, in connect
quiet_exceptions=EnvironmentError)
File "/home/paul/anaconda3/envs/ecopy/lib/python3.5/site-packages/tornado/gen.py", line 1015, in run
value = future.result()
File "/home/paul/anaconda3/envs/ecopy/lib/python3.5/site-packages/tornado/concurrent.py", line 237, in result
raise_exc_info(self._exc_info)
File "<string>", line 3, in raise_exc_info
tornado.gen.TimeoutError: Timeout
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/paul/anaconda3/envs/ecopy/lib/python3.5/site-packages/distributed/worker.py", line 1617, in gather_dep
who=self.address)
File "/home/paul/anaconda3/envs/ecopy/lib/python3.5/site-packages/tornado/gen.py", line 1015, in run
value = future.result()
File "/home/paul/anaconda3/envs/ecopy/lib/python3.5/site-packages/tornado/concurrent.py", line 237, in result
raise_exc_info(self._exc_info)
File "<string>", line 3, in raise_exc_info
File "/home/paul/anaconda3/envs/ecopy/lib/python3.5/site-packages/tornado/gen.py", line 1021, in run
yielded = self.gen.throw(*exc_info)
File "/home/paul/anaconda3/envs/ecopy/lib/python3.5/site-packages/distributed/core.py", line 479, in send_recv_from_rpc
comm = yield self.pool.connect(self.addr)
File "/home/paul/anaconda3/envs/ecopy/lib/python3.5/site-packages/tornado/gen.py", line 1015, in run
value = future.result()
File "/home/paul/anaconda3/envs/ecopy/lib/python3.5/site-packages/tornado/concurrent.py", line 237, in result
raise_exc_info(self._exc_info)
File "<string>", line 3, in raise_exc_info
File "/home/paul/anaconda3/envs/ecopy/lib/python3.5/site-packages/tornado/gen.py", line 1021, in run
yielded = self.gen.throw(*exc_info)
File "/home/paul/anaconda3/envs/ecopy/lib/python3.5/site-packages/distributed/core.py", line 583, in connect
connection_args=self.connection_args)
File "/home/paul/anaconda3/envs/ecopy/lib/python3.5/site-packages/tornado/gen.py", line 1015, in run
value = future.result()
File "/home/paul/anaconda3/envs/ecopy/lib/python3.5/site-packages/tornado/concurrent.py", line 237, in result
raise_exc_info(self._exc_info)
File "<string>", line 3, in raise_exc_info
File "/home/paul/anaconda3/envs/ecopy/lib/python3.5/site-packages/tornado/gen.py", line 1021, in run
yielded = self.gen.throw(*exc_info)
File "/home/paul/anaconda3/envs/ecopy/lib/python3.5/site-packages/distributed/comm/core.py", line 194, in connect
_raise(error)
File "/home/paul/anaconda3/envs/ecopy/lib/python3.5/site-packages/distributed/comm/core.py", line 177, in _raise
raise IOError(msg)
OSError: Timed out trying to connect to 'tcp://127.0.0.1:34876' after 3.0 s: in <distributed.comm.tcp.TCPConnector object at 0x7fcbfc50b4a8>: ConnectionRefusedError: [Errno 111] Connection refused