如何在TensorFlow中加速使用多个GPU的RNN模型的训练?

时间:2017-12-11 23:09:11

标签: tensorflow distributed-computing lstm recurrent-neural-network multiple-gpu

例如,RNN是动态3层双向LSTM,隐藏矢量大小为200(MainFragment),我有4个GPU来训练模型。我在一批样本的子集中看到post使用tf.nn.bidirectional_dynamic_rnn,但这并没有加快培训过程。

1 个答案:

答案 0 :(得分:1)

您也可以尝试model parallelism。一种方法是制作这样的单元格包装器,它将在特定设备上创建单元格:

class DeviceCellWrapper(tf.nn.rnn_cell.RNNCell):
  def __init__(self, cell, device):
    self._cell = cell
    self._device = device

  @property
  def state_size(self):
    return self._cell.state_size

  @property
  def output_size(self):
    return self._cell.output_size

  def __call__(self, inputs, state, scope=None):
    with tf.device(self._device):
      return self._cell(inputs, state, scope)

然后将每个单独的图层放在专用的GPU上:

cell_fw = DeviceCellWrapper(cell=tf.nn.rnn_cell.LSTMCell(num_units=n_neurons, state_is_tuple=False), device='/gpu:0')
cell_bw = DeviceCellWrapper(cell=tf.nn.rnn_cell.LSTMCell(num_units=n_neurons, state_is_tuple=False), device='/gpu:0')
outputs, states = tf.nn.bidirectional_dynamic_rnn(cell_fw, cell_bw, X, dtype=tf.float32)