应用错误收集

分布式Tensorflow：aw / ps上的ps / worker主机？

时间：2017-12-14 01:56:22

标签： python tensorflow tensorflow-gpu

我在使用gpus的aws上使用分布式Tensorflow。当我在本地计算机上训练模型时，我将ps_host / workers_host表示为类似于localhost：2225＆＃39;的内容。在aws的情况下我需要使用什么ps / worker主机？

2 个答案:

答案 0 :(得分：2)

这是一个很好的github项目，展示了如何在AWS上使用分布式TensorFlow与Kubernetes或新的AWS SageMaker：https://github.com/pipelineai/pipeline

至少，你应该使用TensorFlow Estimator API。分布式TensorFlow有很多隐藏的，没有很好记录的技巧。

这里有一些更好的例子：https://github.com/GoogleCloudPlatform/cloudml-samples/tree/master/census

答案 1 :(得分：0)

在集群上运行分布式TF代码时，可以通过“ private ip: port number”访问其他节点。

但是AWS的问题是其他节点无法轻松启动，并且需要额外的配置。