Question

我正在运行带有大量输入文件的分布式Tensorflow程序（每个示例150 MB）。

我希望在 PS 中有一个文件名的共享输入队列，以便工作人员处理不同的示例。

我希望每个工作者的CPU然后读取共享输入队列并生成要处理的GPU数据。

以下代码仅由工人运行：

with tf.Graph().as_default():

    with tf.device('/job:ps/replica:0/task:0'):
        file_queue =tf.train.string_input_producer(file_paths, shared_name='train_queue')

    with tf.device('/cpu:0'):
        input_tensors = model.input_fn(file_queue, ...)

    # sets variables to PS and ops default to GPU
    with tf.device(tf.train.replica_device_setter(cluster=cluster_spec)):
        output_tensors = model.model_fn(input_tensors, ...)

但是，Reader() file_queue（位于model.input_fn()内）的tf.device()被放置在 PS 中，而不是放在工作人员中。 CPU，作为使用Reader()指定的treid。

这导致在 PS 和工作人员之间发送150 MB消息，这会减慢培训速度（我只注意到这一点，因为google-cloud ml引擎会在发送大消息时发出警告）。

为什么input_fn()没有被放在工人身上？中央处理器？队列及其阅读器是否必须位于同一设备上？

以下是我之前上下文的link，可能会提供更多背景信息。

以下是def input_fn(file_queue, ...): reader = tf.TFRecordReader() _, example = reader.read(file_queue) image, ground_truth = my_decoder(example) image, ground_truth = tf.train.shuffle_batch([image, ground_truth], ...) return image, ground_truth的代码：

tf.TFRecordReader()

问题是Why not. Support you have an angle, then you can make two orthogonality lines. Calculate the distance between pts and each line, then calc max_dist - min_dict, you will get width and height.被放置在 PS 中。所有其他操作（解码器和批处理）都正确放置在工作人员的手中。的CPU。

Tensorflow在PS服务器中与工作者中的读取器共享队列

0 个答案: