Question

我在tensorflow中找到了两种RNN实现。

第一个实现是this（从第124行到第129行）。它使用循环来定义RNN中输入的每个步骤。

with tf.variable_scope("RNN"):
      for time_step in range(num_steps):
        if time_step > 0: tf.get_variable_scope().reuse_variables()
        (cell_output, state) = cell(inputs[:, time_step, :], state)
        outputs.append(cell_output)
        states.append(state)

第二个实现是this（从第51行到第70行）。它不使用任何循环来定义RNN中的每个输入步骤。

def RNN(_X, _istate, _weights, _biases):

    # input shape: (batch_size, n_steps, n_input)
    _X = tf.transpose(_X, [1, 0, 2])  # permute n_steps and batch_size
    # Reshape to prepare input to hidden activation
    _X = tf.reshape(_X, [-1, n_input]) # (n_steps*batch_size, n_input)
    # Linear activation
    _X = tf.matmul(_X, _weights['hidden']) + _biases['hidden']

    # Define a lstm cell with tensorflow
    lstm_cell = rnn_cell.BasicLSTMCell(n_hidden, forget_bias=1.0)
    # Split data because rnn cell needs a list of inputs for the RNN inner loop
    _X = tf.split(0, n_steps, _X) # n_steps * (batch_size, n_hidden)

    # Get lstm cell output
    outputs, states = rnn.rnn(lstm_cell, _X, initial_state=_istate)

    # Linear activation
    # Get inner loop last output
    return tf.matmul(outputs[-1], _weights['out']) + _biases['out']

在第一个实现中，我发现输入单元与隐藏单元之间没有权重矩阵，只定义隐藏单元到输出单元之间的权重矩阵（从132到133行）.. < / p>

output = tf.reshape(tf.concat(1, outputs), [-1, size])
        softmax_w = tf.get_variable("softmax_w", [size, vocab_size])
        softmax_b = tf.get_variable("softmax_b", [vocab_size])
        logits = tf.matmul(output, softmax_w) + softmax_b

但在第二个实现中，两个权重矩阵都已定义（从第42行到第47行）。

weights = {
    'hidden': tf.Variable(tf.random_normal([n_input, n_hidden])), # Hidden layer weights
    'out': tf.Variable(tf.random_normal([n_hidden, n_classes]))
}
biases = {
    'hidden': tf.Variable(tf.random_normal([n_hidden])),
    'out': tf.Variable(tf.random_normal([n_classes]))
}

我想知道为什么？

Answer 1

我注意到的差异是second implementation中的代码使用tf.nn.rnn，它获取每个时间步的输入列表并生成每个时间步的输出列表。

（输入：输入的长度T列表，每个都是一个形状的张量 [batch_size，input_size]。）

因此，如果你检查第62行第二个实现中的代码，输入数据将被整形为n_steps *（batch_size，n_hidden）

# Split data because rnn cell needs a list of inputs for the RNN inner loop
_X = tf.split(0, n_steps, _X) # n_steps * (batch_size, n_hidden)

在1st implementation中，他们循环遍历n_time_steps并提供输入并获取相应的输出并存储在输出列表中。

第113到117行的代码段

outputs = []
    state = self._initial_state
    with tf.variable_scope("RNN"):
      for time_step in range(num_steps):
        if time_step > 0: tf.get_variable_scope().reuse_variables()
        (cell_output, state) = cell(inputs[:, time_step, :], state)
        outputs.append(cell_output)

来到你的第二个问题：

如果您在两个实现中都仔细注意了输入被输入RNN的方式。

在第一个实现中，输入的形状已经是batch_size x num_steps（这里num_steps是隐藏的大小）：

self._input_data = tf.placeholder(tf.int32, [batch_size, num_steps])

而在第二种实现中，初始输入具有形状（batch_size x n_steps x n_input）。因此需要一个权重矩阵来转换为形状（n_steps x batch_size x hidden_size）：

    # Input shape: (batch_size, n_steps, n_input)
    _X = tf.transpose(_X, [1, 0, 2])  # Permute n_steps and batch_size
    # Reshape to prepare input to hidden activation
    _X = tf.reshape(_X, [-1, n_input]) # (n_steps*batch_size, n_input)
    # Linear activation
    _X = tf.matmul(_X, _weights['hidden']) + _biases['hidden']
    # Split data because rnn cell needs a list of inputs for the RNN inner loop
    _X = tf.split(0, n_steps, _X) # n_steps * (batch_size, n_hidden)

我希望这有用......

张量流中两个RNN实现之间有什么区别？

1 个答案:

第113到117行的代码段