我目前正在研究如何从头开始实现lstm网络。现在,我使用以下代码在Tensorflow中实现了一个非常简单的模型:
with tf.variable_scope(scope or 'network', reuse=tf.AUTO_REUSE):
cells = []
for size in state_sizes:
cell = tf.nn.rnn_cell.LSTMCell(size, state_is_tuple=True, name='test')
cells.append(cell)
multi_cell = tf.nn.rnn_cell.MultiRNNCell(cells, state_is_tuple=True)
rnn_outputs, state = tf.nn.dynamic_rnn(multi_cell, X, initial_state=state_tuple)
network_output = rnn_outputs
state = state
其中state_sizes = [4],它将创建一个状态维度为4的lstm单元。但是,从头开始尝试实现相同的网络在向前传播时不会产生相同的结果...
通过查看tensorflow / python / ops / rnn_cell_impl.py模块(Tensorflow包)和方法LSTMCell.call(),我实现了类似的功能(应该是正向传播算法):
def propagate_one_step(inp, c_prev, m_prev, kernel, bias):
tmp1 = np.concatenate((inp, m_prev))
tmp2 = np.matmul(tmp1, kernel)
lstm_matrix = tmp2 + bias
split_index = int(lstm_matrix.shape[0]/4)
i = lstm_matrix[:split_index]
j = lstm_matrix[split_index:(2*split_index)]
f = lstm_matrix[(2*split_index):(3*split_index)]
o = lstm_matrix[(3*split_index):]
# 1 is the forget bias
c_1 = sigmoid(f + 1) * c_prev + sigmoid(i) + np.tanh(j)
m_1 = sigmoid(o) * np.tanh(c_1)
return m_1, c_1
这是我用来传播一步的方法:
# get kernel and bias weights from the lstm
bias = sess.run(RNN.multi_cell._cells[0]._bias)
kernel = sess.run(RNN.multi_cell._cells[0]._kernel)
# define the initial states of my lstm implementation
m = np.zeros(4)
c = np.zeros(4)
tmp_in = np.array([1, 1])
m, c = propagate_one_step(tmp_in, c, m, kernel, bias)
# propagate one step with tensorflow
_current_state = zero_state # just a lot of zeros
feed_dict_fake = {
inputs: tmp_in.reshape((1, 1, 2)),
state_placeholder: _current_state}
output, _current_state = sess.run([RNN.network_output, RNN.state], feed_dict=feed_dict_fake)
c_tf = _current_state[0][0]
m_tf = _current_state[0][1]
在这种状态下,我希望m_tf等于m,并且c_tf等于c,但事实并非如此。谁能告诉我实现与Tensorflow的区别?
我现在的猜测是,TF不仅可以运行LSTMCell.call()来定义计算,还可以做更多的事情。
编辑:我的lstm传播的实现有一个错字: 用tanh(j)乘法而不是加法。正确的代码如下:
def propagate_one_step(inp, c_prev, m_prev, kernel, bias):
tmp1 = np.concatenate((inp, m_prev))
tmp2 = np.matmul(tmp1, kernel)
lstm_matrix = tmp2 + bias
split_index = int(lstm_matrix.shape[0]/4)
i = lstm_matrix[:split_index]
j = lstm_matrix[split_index:(2*split_index)]
f = lstm_matrix[(2*split_index):(3*split_index)]
o = lstm_matrix[(3*split_index):]
# 1 is the forget bias
c_1 = sigmoid(f + 1) * c_prev + sigmoid(i) * np.tanh(j)
m_1 = sigmoid(o) * np.tanh(c_1)
return m_1, c_1