我想在this paper中实现2D LSTM,特别是我想动态地这样做,所以使用tf.while。简而言之,该网络的工作原理如下。
2D和常规LSTM之间的区别是我们在序列中的前一个元素和当前像素正上方的像素之间有一个循环连接,所以在像素i,j是与i - 1,j和i的连接, j - 1。
我尝试使用tf进行此操作。而在循环的每次迭代中,我将激活和单元状态累积到我允许变化的形状的张量中。这就是下面的代码块尝试做的事情。
def single_lstm_layer(inputs, height, width, units, direction = 'tl'):
with tf.variable_scope(direction) as scope:
#Get 2D lstm cell
cell = lstm_cell
#position in sequence
row, col = tf.to_int32(0), tf.to_int32(0)
#use for when i - 1 < 0 or j - 1 < 0
zero_state = tf.fill([1, units], 0.0)
#get first activation and cell_state
output, state = cell(inputs.read(row * width + col), zero_state, zero_state, zero_state, zero_state)
#these are currently of shape (1, units) will ultimately be of shape
#(height * width, untis)
activations = output
cell_states = state
col += 1
with tf.variable_scope(direction, reuse = True) as scope:
def loop_fn(activations, cell_states, row, col):
#Read next input in sequence
i = inputs.read(row * width + col)
#if we are not in the first row then we want to get the activation/cell_state
#above us. Otherwise use zero state.
hidden_state_t = tf.cond(tf.greater_equal(row - 1, 0),
lambda:tf.gather(activations, [(row - 1) * (width) + col]),
lambda:tf.identity(zero_state))
cell_state_t = tf.cond(tf.greater_equal(row - 1, 0),
lambda:tf.gather(cell_states, [(row - 1) * (width) + col]),
lambda:tf.identity(zero_state))
#if we are not in the first col then we want to get the activation/cell_state
#left of us. Otherwise use zero state.
hidden_state_l = tf.cond(tf.greater_equal(col - 1, 0),
lambda:tf.gather(activations, [row * (width) + col - 1]),
lambda:tf.identity(zero_state))
cell_state_l = tf.cond(tf.greater_equal(col - 1, 0),
lambda:tf.gather(cell_states, [row * (width) + col - 1]),
lambda:tf.identity(zero_state))
#Using previous activations/cell_states get current activation/cell_state
output, state = cell(i, hidden_state_l, hidden_state_t, cell_state_l, cell_state_t)
#Append to bottom, will increase number of rows by 1
activations = tf.concat(0, [activations, output])
cell_states = tf.concat(0, [cell_states, state])
#move to next item in sequence
col = tf.cond(tf.equal(col, width - 1), lambda:tf.mul(col, 0), lambda:tf.add(col, 1))
row = tf.cond(tf.equal(col, 0), lambda:tf.add(row, 1), lambda:tf.identity(row))
return activations, cell_states, row, col,
row, col = tf.to_int32(0), tf.constant(1)
activations, cell_states, _, _ = tf.while_loop(
cond = lambda activations, cell_states, row, col: tf.logical_and(tf.less_equal(row , (height - 1)), tf.less_equal(col, width -1)) ,
body = loop_fn,
loop_vars = (activations,
cell_states,
row,
col),
shape_invariants = (tf.TensorShape((None, units)),
tf.TensorShape((None, units)),
tf.TensorShape([]),
tf.TensorShape([]),
),
)
#Return activations with shape [height, width, units]
return tf.pack(tf.split(0, height, activations))
这至少在向前的方向上起作用。也就是说,如果我看一下会话中返回的内容,那么我得到我想要的3D张量,称之为T,形状[高度,宽度,单位],其中T [i,j,:]包含在输入i,j。
激活LSTM细胞然后我想对每个像素进行分类,为此目的,我将T转换为T,然后将结果重新整形为[height * width,num_labels]并构造交叉熵损失。
T = tf.nn.conv2d(T, W, strides = [1, 1, 1, 1], padding = 'VALID')
T = tf.reshape(T, [height * width, num_labels])
loss = tf.reduce_mean(
tf.nn.softmax_cross_entropy_with_logits(
labels = tf.reshape(labels, [height * width, num_labels]),
logits = T)
)
optimizer = tf.train.AdagradOptimizer(0.01).minimize(loss)
然而现在我尝试使用28 x 28和32个单位的图像
sess.run(optimizer, feed_dict = feed_dict)
我收到以下错误
File "Assignment2/train_model.py", line 52, in <module>
train_models()
File "/Assignment2/train_model.py", line 12, in train_models
image, out, labels, optomizer, accuracy, prediction, ac = build_graph(28, 28)
File "/Assignment2/multidimensional.py", line 101, in build_graph
optimizer = tf.train.AdagradOptimizer(0.01).minimize(loss)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/optimizer.py", line 196, in minimize
grad_loss=grad_loss)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/optimizer.py", line 253, in compute_gradients
colocate_gradients_with_ops=colocate_gradients_with_ops)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gradients.py", line 491, in gradients
in_grad.set_shape(t_in.get_shape())
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 408, in set_shape
self._shape = self._shape.merge_with(shape)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/tensor_shape.py", line 579, in merge_with
(self, other))
ValueError: Shapes (784, 32) and (1, 32) are not compatible
我认为这是计算由tf.while循环产生的渐变的问题,但我现在很丢失。