我对tf.contrib.rnn.LSTMBlockCell
和tf.contrib.cudnn_rnn.CudnnCompatibleLSTMCell
有相同的问题:
如何从numpy数组正确初始化LSTM权重?以下代码片段已执行,但似乎没有执行 做我想要的:
train_data = np.load('mnist_train_data.npy').reshape(-1,28,28)
train_label = np.load('mnist_train_label.npy')
params = [np.random.randn(28+128, 4*128), np.zeros(4*128)]
X = tf.placeholder(tf.float32, shape=[54999, 28, 28])
y = tf.placeholder(tf.int64, None)
state = LSTMStateTuple(*(tf.zeros((54999, 128), dtype=tf.float32) for _ in range(2)))
cell = tf.contrib.rnn.LSTMBlockCell(128)
cell.build(tf.TensorShape((None, 28)))
cell.set_weights(params)
initial_weights = cell.get_weights()
print(np.array_equal(params[0], initial_weights[0]))
w1 = tf.Variable(np.random.randn(128, 10), dtype=tf.float32)
b1 = tf.Variable(np.zeros(10), dtype=tf.float32)
full_seq, current_state = tf.nn.dynamic_rnn(cell, X, initial_state=state, dtype=tf.float32)
output = tf.matmul(current_state[1], w1)
output += b1
loss = tf.losses.softmax_cross_entropy(y, output)
train_step = tf.train.AdamOptimizer(0.01).minimize(loss)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for i in range(1):
feed_dict = {X: train_data, y: train_label}
sess.run(train_step, feed_dict=feed_dict)
final_weights = cell.get_weights()
print(np.array_equal(initial_weights[0], final_weights[0]))
这会在第一个打印语句中打印出False
,因此numpy数组实际上似乎没有用作权重。
此外,在训练课程结束后,此信息会打印出来True
,这意味着在训练期间这些权重实际上并未更新。
预先感谢您对此主题的任何帮助。