使用Ray在群集上进行部署的简单示例(指南不起作用)

时间:2019-06-02 04:34:45

标签: python ray

我有两个服务器;每个都有一个GPU。我想运行一种强化学习算法,该算法通过使用Ray同时利用两个服务器。

我想象其中一台服务器应充当主要数据存储,并运行主驱动程序进程,该进程根据从服务器接收的结果来调整神经网络权重。

在此quick start guide之后并使用this cluster file,我得到以下输出:

2019-06-02 04:29:47,169 INFO node_provider.py:34 -- ClusterState: Loaded cluster state: {}
2019-06-02 04:29:47,170 INFO node_provider.py:59 -- ClusterState: Writing cluster state: {'YOUR_HEAD_NODE_HOSTNAME': {'tags': {'ray-node-type': 'head'}, 'state': 'terminated'}}
This will create a new cluster [y/N]: y
2019-06-02 04:29:49,023 INFO commands.py:189 -- get_or_create_head_node: Launching new head node...
2019-06-02 04:29:49,024 INFO node_provider.py:77 -- ClusterState: Writing cluster state: {'YOUR_HEAD_NODE_HOSTNAME': {'tags': {'ray-node-type': 'head', 'ray-launch-config': '5a0ccc99d6349f2fb9699284ae2a3547c548975f', 'ray-node-name': 'ray-default-head'}, 'state': 'running'}}
2019-06-02 04:29:49,024 INFO commands.py:202 -- get_or_create_head_node: Updating files on head node...
Traceback (most recent call last):
  File "/usr/local/bin/ray", line 10, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.6/dist-packages/ray/scripts/scripts.py", line 771, in main
    return cli()
  File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/ray/scripts/scripts.py", line 462, in create_or_update
    no_restart, restart_only, yes, cluster_name)
  File "/usr/local/lib/python3.6/dist-packages/ray/autoscaler/commands.py", line 47, in create_or_update_cluster
    override_cluster_name)
  File "/usr/local/lib/python3.6/dist-packages/ray/autoscaler/commands.py", line 241, in get_or_create_head_node
    initialization_commands=config["initialization_commands"],
KeyError: 'initialization_commands'

你知道这里怎么了吗?理想情况下,我想举一个超级简单的示例来进行设置。

0 个答案:

没有答案