我正在使用dask 1.1.1(最新版本),并使用以下命令在命令行中启动了dask调度程序:
$ dask-scheduler --port 9796 --bokeh-port 9797 --bokeh-prefix my_project
distributed.scheduler - INFO - -----------------------------------------------
distributed.scheduler - INFO - Clear task state
distributed.scheduler - INFO - Scheduler at: tcp://10.1.0.107:9796
distributed.scheduler - INFO - bokeh at: :9797
distributed.scheduler - INFO - Local Directory: /tmp/scheduler-pdnwslep
distributed.scheduler - INFO - -----------------------------------------------
distributed.scheduler - INFO - Register tcp://10.1.25.4:36310
distributed.scheduler - INFO - Starting worker compute stream, tcp://10.1.25.4:36310
distributed.core - INFO - Starting established connection
然后...我试图使用以下代码启动客户端以连接到调度程序:
from dask.distributed import Client
c = Client('10.1.0.107:9796', set_as_default=False)
但是尝试这样做时,出现错误:
...
File "/root/anaconda3/lib/python3.7/site-packages/tornado/concurrent.py", line 238, in result
raise_exc_info(self._exc_info)
File "<string>", line 4, in raise_exc_info
tornado.gen.TimeoutError: Timeout
During handling of the above exception, another exception occurred:
...
File "/root/anaconda3/lib/python3.7/site-packages/distributed/comm/core.py", line 195, in _raise
raise IOError(msg)
OSError: Timed out trying to connect to 'tcp://10.1.0.107:9796' after 10 s: connect() didn't finish in time
这已经在已经运行了几个月的系统中进行了硬编码。所以我只是写这个问题来验证我没有以编程方式做错什么对吗?我认为环境一定有问题。一切对您来说都合适吗?在dask和python之外,什么样的事情可以阻止这种情况?证书?不同版本的软件包?想法
答案 0 :(得分:1)
(查看相关评论)
dask 的包装器,主要用于在我们的特定配置中进行烘焙,并使其易于在我们的系统中使用 docker 容器:
''' daskwrapper: easy access to distributed computing '''
import webbrowser
from dask.distributed import Client as DaskClient
from . import config
scheduler_config = { # from yaml
"scheduler_hostname": "schedulermachine.corpdomain.com"
"scheduler_ip": "10.0.0.1"}
worker_config = { # from yaml
"environments": {
"generic": {
"scheduler_port": 9796,
"dashboard_port": 9797,
"worker_port": 67176}}}
class Client():
def __init__(self, environment: str):
(
self.scheduler_hostname,
self.scheduler_port,
self.dashboard_port,
self.scheduler_address) = self.get_scheduler_details(environment)
self.client = DaskClient(self.scheduler_address, asynchronous=False)
def get_scheduler_details(self, environment: str) -> tuple:
''' gets it from a map of availble docker images... '''
envs = worker_config['environments']
return (
scheduler_config['scheduler_hostname'],
envs[environment]['scheduler_port'],
envs[environment]['dashboard_port'],
(
f"{scheduler_config['scheduler_hostname']}:"
f"{str(envs[environment]['scheduler_port'])}"))
def open_status(self):
webbrowser.open_new_tab(self.get_status())
def get_status(self):
return f'http://{self.scheduler_hostname}:{self.dashboard_port}/status'
def get_async_client(self):
''' returns a client instance so the user can use it directly '''
return DaskClient(self.scheduler_address, asynchronous=True)
def get(self, workflow: dict, tasks: 'str|list'):
return self.client.get(workflow, tasks)
async def submit(self, function: callable, args: list):
''' saved as example dask api '''
if not isinstance(args, list) and not isinstance(args, tuple):
args = [args]
async with DaskClient(self.scheduler_address, asynchronous=True) as client:
future = client.submit(function, *args)
result = await future
return result
def close(self):
return self.client.close()
那是客户端,它是这样使用的:
from daskwrapper import Client
dag = {'some_task': (some_task_function, )}
workers = Client(environment='some_environment')
workers.get(workflow=dag, tasks='some_task')
workers.close()
调度程序是这样启动的:
def start():
def start_scheduler(port, dashboard_port):
async def f():
s = Scheduler(
port=port,
dashboard_address=f"0.0.0.0:{dashboard_port}")
s = await s
await s.finished()
asyncio.get_event_loop().run_until_complete(f())
worker_config = configs.get(repo='spartan_worker')
envs = worker_config['environments']
for key, value in envs.items():
port = value['scheduler_port']
dashboard_port = str(value['dashboard_port'])
thread = Thread(
target=start_scheduler,
args=(port, dashboard_port))
thread.start()
和工人:
def start(
scheduler_address: str,
scheduler_port: int,
worker_address: str,
worker_port: int
):
async def f(scheduler_address):
w = await Worker(
scheduler_address,
port=worker_port,
contact_address=f'{worker_address}:{worker_port}')
await w.finished()
asyncio.get_event_loop().run_until_complete(f(
f'tcp://{scheduler_address}:{str(scheduler_port)}'))
这可能不会直接帮助你解决这个问题,但我相信自从我们对它进行 dockerized 之后,我们不再有那个问题了。这里有很多缺失,但这是基础知识,并且可能有更好的方法在分布式计算上获得专门的环境以方便使用,但这符合我们的需求。