我正在尝试使用Tensorflow(r0.10,python 3.5)训练一个关于玩具分类问题的回归神经网络,但是我得到了令人困惑的结果。
input sequence: [0, 0, 1, 0, 1, 1]
binary digits : [-, [0,0], [0,1], [1,0], [0,1], [1,1]]
target class : [-, 0, 1, 2, 1, 3]
的列表,其元素为[batch_size x input_size]
的序列列表(documentation我不清楚哪个维度是被视为时间维度)。 这种理解是否正确?如果是这种情况,那么我不明白为什么RNN模型没有正确学习。
很难获得一小段可以通过我的完整RNN运行的代码,这是我能做的最好的(它主要是从the PTB model here和the char-rnn model here改编而来的):
import tensorflow as tf
import numpy as np
input_size = 1
batch_size = 50
T = 2
lstm_size = 5
lstm_layers = 2
num_classes = 4
learning_rate = 0.1
lstm = tf.nn.rnn_cell.BasicLSTMCell(lstm_size, state_is_tuple=True)
lstm = tf.nn.rnn_cell.MultiRNNCell([lstm] * lstm_layers, state_is_tuple=True)
x = tf.placeholder(tf.float32, [T, batch_size, input_size])
y = tf.placeholder(tf.int32, [T * batch_size * input_size])
init_state = lstm.zero_state(batch_size, tf.float32)
inputs = [tf.squeeze(input_, [0]) for input_ in tf.split(0,T,x)]
outputs, final_state = tf.nn.rnn(lstm, inputs, initial_state=init_state)
w = tf.Variable(tf.truncated_normal([lstm_size, num_classes]), name='softmax_w')
b = tf.Variable(tf.truncated_normal([num_classes]), name='softmax_b')
output = tf.concat(0, outputs)
logits = tf.matmul(output, w) + b
probs = tf.nn.softmax(logits)
cost = tf.reduce_mean(tf.nn.seq2seq.sequence_loss_by_example(
[logits], [y], [tf.ones_like(y, dtype=tf.float32)]
optimizer = tf.train.GradientDescentOptimizer(learning_rate)
tvars = tf.trainable_variables()
grads, _ = tf.clip_by_global_norm(tf.gradients(cost, tvars),
train_op = optimizer.apply_gradients(zip(grads, tvars))
init = tf.initialize_all_variables()
with tf.Session() as sess:
curr_state = sess.run(init_state)
for i in range(3000):
# Create toy data where the true class is the value represented
# by the current and previous value treated as binary, i.e.
train_x = np.random.randint(0,2,(T * batch_size * input_size))
train_y = train_x + np.concatenate(([0], (train_x[:-1] * 2)))
# Reshape into T x batch_size x input_size
train_x = np.reshape(train_x, (T, batch_size, input_size))
feed_dict = {
x: train_x, y: train_y
for j, (c, h) in enumerate(init_state):
feed_dict[c] = curr_state[j].c
feed_dict[h] = curr_state[j].h
fetch_dict = {
'cost': cost, 'final_state': final_state, 'train_op': train_op
# Evaluate the graph
fetches = sess.run(fetch_dict, feed_dict=feed_dict)
curr_state = fetches['final_state']
if i % 300 == 0:
print('step {}, train cost: {}'.format(i, fetches['cost']))
# Test
test_x = np.array([[0],[0],[1],[0],[1],[1]]*(T*batch_size*input_size))
test_x = test_x[:(T*batch_size*input_size),:]
probs_out = sess.run(probs, feed_dict={
x: np.reshape(test_x, [T, batch_size, input_size]),
init_state: curr_state
# Get the softmax outputs for the points in the sequence
# that have [0, 0], [0, 1], [1, 0], [1, 1] as their
# last two values.
for i in [1, 2, 3, 5]:
print('{}: [{:.4f} {:.4f} {:.4f} {:.4f}]'.format(
[1, 2, 3, 5].index(i), *list(probs_out[i,:]))
0: [0.4899 0.0007 0.5080 0.0014]
1: [0.0003 0.5155 0.0009 0.4833]
2: [0.5078 0.0011 0.4889 0.0021]
3: [0.0003 0.5052 0.0009 0.4936]
表示它只是学习区分[0,2]和[1,3]。 为什么这个模型不能学习使用序列中的先前值?
答案 0 :(得分:3)
在this blog post的帮助下计算出来(它有很好的输入张量图)。事实证明,我没有正确理解tf.nn.rnn()
- 长度列表,其中每个元素的形状为batch_size x input_size
。 这意味着您的连续序列将分布在列表 的元素中。我认为连续序列将保持在一起,以便列表inputs
回想起来这是有道理的,因为我们希望并行化序列中的每一步,所以我们想要运行每个序列的第一步(列表中的第一个元素),然后是每个序列的第二步(列表中的第二个元素) )等等。
import tensorflow as tf
import numpy as np
sequence_size = 50
batch_size = 7
num_features = 1
lstm_size = 5
lstm_layers = 2
num_classes = 4
learning_rate = 0.1
lstm = tf.nn.rnn_cell.BasicLSTMCell(lstm_size, state_is_tuple=True)
lstm = tf.nn.rnn_cell.MultiRNNCell([lstm] * lstm_layers, state_is_tuple=True)
x = tf.placeholder(tf.float32, [batch_size, sequence_size, num_features])
y = tf.placeholder(tf.int32, [batch_size * sequence_size * num_features])
init_state = lstm.zero_state(batch_size, tf.float32)
inputs = [tf.squeeze(input_, [1]) for input_ in tf.split(1,sequence_size,x)]
outputs, final_state = tf.nn.rnn(lstm, inputs, initial_state=init_state)
w = tf.Variable(tf.truncated_normal([lstm_size, num_classes]), name='softmax_w')
b = tf.Variable(tf.truncated_normal([num_classes]), name='softmax_b')
output = tf.reshape(tf.concat(1, outputs), [-1, lstm_size])
logits = tf.matmul(output, w) + b
probs = tf.nn.softmax(logits)
cost = tf.reduce_mean(tf.nn.seq2seq.sequence_loss_by_example(
[logits], [y], [tf.ones_like(y, dtype=tf.float32)]
# Now optimize on that cost
optimizer = tf.train.GradientDescentOptimizer(learning_rate)
tvars = tf.trainable_variables()
grads, _ = tf.clip_by_global_norm(tf.gradients(cost, tvars),
train_op = optimizer.apply_gradients(zip(grads, tvars))
init = tf.initialize_all_variables()
with tf.Session() as sess:
curr_state = sess.run(init_state)
for i in range(3000):
# Create toy data where the true class is the value represented
# by the current and previous value treated as binary, i.e.
train_x = np.random.randint(0,2,(batch_size * sequence_size * num_features))
train_y = train_x + np.concatenate(([0], (train_x[:-1] * 2)))
# Reshape into T x batch_size x sequence_size
train_x = np.reshape(train_x, [batch_size, sequence_size, num_features])
feed_dict = {
x: train_x, y: train_y
for j, (c, h) in enumerate(init_state):
feed_dict[c] = curr_state[j].c
feed_dict[h] = curr_state[j].h
fetch_dict = {
'cost': cost, 'final_state': final_state, 'train_op': train_op
# Evaluate the graph
fetches = sess.run(fetch_dict, feed_dict=feed_dict)
curr_state = fetches['final_state']
if i % 300 == 0:
print('step {}, train cost: {}'.format(i, fetches['cost']))
# Test
test_x = np.array([[0],[0],[1],[0],[1],[1]]*(batch_size * sequence_size * num_features))
test_x = test_x[:(batch_size * sequence_size * num_features),:]
probs_out = sess.run(probs, feed_dict={
x: np.reshape(test_x, [batch_size, sequence_size, num_features]),
init_state: curr_state
# Get the softmax outputs for the points in the sequence
# that have [0, 0], [0, 1], [1, 0], [1, 1] as their
# last two values.
for i in [1, 2, 3, 5]:
print('{}: [{:.4f} {:.4f} {:.4f} {:.4f}]'.format(
[1, 2, 3, 5].index(i), *list(probs_out[i,:]))