Question

我目前正在研究如何从头开始实现lstm网络。现在，我使用以下代码在Tensorflow中实现了一个非常简单的模型：

with tf.variable_scope(scope or 'network', reuse=tf.AUTO_REUSE):
cells = []
for size in state_sizes:
    cell = tf.nn.rnn_cell.LSTMCell(size, state_is_tuple=True, name='test')
    cells.append(cell)
multi_cell = tf.nn.rnn_cell.MultiRNNCell(cells, state_is_tuple=True)
rnn_outputs, state = tf.nn.dynamic_rnn(multi_cell, X, initial_state=state_tuple)

network_output = rnn_outputs
state = state

其中state_sizes = [4]，它将创建一个状态维度为4的lstm单元。但是，从头开始尝试实现相同的网络在向前传播时不会产生相同的结果...

通过查看tensorflow / python / ops / rnn_cell_impl.py模块（Tensorflow包）和方法LSTMCell.call（），我实现了类似的功能（应该是正向传播算法）：

def propagate_one_step(inp, c_prev, m_prev, kernel, bias):
    tmp1 = np.concatenate((inp, m_prev))
    tmp2 = np.matmul(tmp1, kernel)
    lstm_matrix = tmp2 + bias
    split_index = int(lstm_matrix.shape[0]/4)
    i = lstm_matrix[:split_index]
    j = lstm_matrix[split_index:(2*split_index)]
    f = lstm_matrix[(2*split_index):(3*split_index)]
    o = lstm_matrix[(3*split_index):]

    # 1 is the forget bias
    c_1 = sigmoid(f + 1) * c_prev + sigmoid(i) + np.tanh(j)
    m_1 = sigmoid(o) * np.tanh(c_1)
    return m_1, c_1

这是我用来传播一步的方法：

# get kernel and bias weights from the lstm
bias = sess.run(RNN.multi_cell._cells[0]._bias)
kernel = sess.run(RNN.multi_cell._cells[0]._kernel)
# define the initial states of my lstm implementation
m = np.zeros(4)
c = np.zeros(4)

tmp_in = np.array([1, 1])

m, c = propagate_one_step(tmp_in, c, m, kernel, bias)

# propagate one step with tensorflow
_current_state = zero_state # just a lot of zeros
feed_dict_fake = {
    inputs: tmp_in.reshape((1, 1, 2)),
    state_placeholder: _current_state}
output, _current_state = sess.run([RNN.network_output, RNN.state], feed_dict=feed_dict_fake)
c_tf = _current_state[0][0]
m_tf = _current_state[0][1]

在这种状态下，我希望m_tf等于m，并且c_tf等于c，但事实并非如此。谁能告诉我实现与Tensorflow的区别？

我现在的猜测是，TF不仅可以运行LSTMCell.call（）来定义计算，还可以做更多的事情。

编辑：我的lstm传播的实现有一个错字：用tanh（j）乘法而不是加法。正确的代码如下：

def propagate_one_step(inp, c_prev, m_prev, kernel, bias):
    tmp1 = np.concatenate((inp, m_prev))
    tmp2 = np.matmul(tmp1, kernel)
    lstm_matrix = tmp2 + bias
    split_index = int(lstm_matrix.shape[0]/4)
    i = lstm_matrix[:split_index]
    j = lstm_matrix[split_index:(2*split_index)]
    f = lstm_matrix[(2*split_index):(3*split_index)]
    o = lstm_matrix[(3*split_index):]

    # 1 is the forget bias
    c_1 = sigmoid(f + 1) * c_prev + sigmoid(i) * np.tanh(j)
    m_1 = sigmoid(o) * np.tanh(c_1)
    return m_1, c_1

在Python中从头开始实施LSTM

0 个答案: