Question

我正在尝试使用keras + tensorflow实现高斯注意力，如下所述：http://akosiorek.github.io/ml/2017/10/14/visual-attention.html#mjx-eqn-att

为此，我写了一个像这样的自定义Keras层（与博客文章相比，我调整了gaussian_mask方法）：

def gaussian_mask(u, s, d, R, C, transpose=False):
    """
    :param u: tf.Tensor, centre of the first Gaussian.
    :param s: tf.Tensor, standard deviation of Gaussians.
    :param d: tf.Tensor, shift between Gaussian centres.
    :param R: int, number of rows in the mask, there is one Gaussian per row.
    :param C: int, number of columns in the mask.
    """
    # indices to create centres
    R = tf.to_float(tf.reshape(tf.range(R), (R, 1, 1)))
    C = tf.to_float(tf.reshape(tf.range(C), (1, C, 1)))


    centres = u[:, np.newaxis, np.newaxis] + R * d
    column_centres = C - centres
    mask = tf.exp(-.5 * tf.square(column_centres / s))
    # we add eps for numerical stability
    normalised_mask = mask / (tf.reduce_sum(mask, 1, keep_dims=True) + 1e-8)

    return normalised_mask

class visual_attention_layer(Layer):
    def __init__(self, output_dim, transpose=False, **kwargs):
        self.output_dim = output_dim
        self.transpose = transpose
        super(visual_attention_layer, self).__init__(**kwargs)

    def build(self, input_shape):
        super(visual_attention_layer, self).build(input_shape)

    def call(self, x): 
        x_x, x_y, input_img = x

        u_x,s_x,d_x = tf.split(x1,3,1)
        u_y,s_y,d_y = tf.split(x2,3,1)


        W = input_img.shape[1]
        H = W = input_img.shape[2]
        Ay = gaussian_mask(u_y, s_y, d_y, self.output_dim[0], H)
        Ax = gaussian_mask(u_x, s_x, d_x, self.output_dim[0], W)

        input_img = tf.transpose(input_img, perm=[0,3,1,2])
        Ay = tf.transpose(Ay, perm=[0, 3, 1, 2])
        Ax = tf.transpose(Ax, perm=[0, 3, 1, 2])


        glimpse = tf.matmul( input_img, Ax, transpose_b=True)
        glimpse = tf.matmul(Ay, glimpse)
        glimpse = tf.transpose(glimpse, perm=[0,2,3,1])

        return glimpse

    def compute_output_shape(self, input_shape):
        return (self.output_dim[0], self.output_dim[1], input_shape[2][3])

然后像这样使用它：

inputs = Input(shape=(28,28,1))

x = Conv2D(64, kernel_size=(3,3), activation="relu")(inputs)
x = MaxPool2D()(x)
x = Conv2D(64, kernel_size=(3,3), activation="relu")(x)
x = MaxPool2D()(x)
x = Flatten()(x)

x1 = Dense(3, activation="sigmoid")(x)
x2 = Dense(3, activation="sigmoid")(x)
x = visual_attention_layer(output_dim=(20,20))([x1,x2, inputs])

x = Conv2D(64, kernel_size=(3,3), activation="relu")(x)
#x = MaxPool2D()(x)
x = Conv2D(64, kernel_size=(3,3), activation="relu")(x)
x = Flatten()(x)
predictions = Dense(10, activation='softmax')(x)

model = Model(inputs=inputs, outputs=predictions)
model.compile(optimizer='rmsprop',
              loss='categorical_crossentropy',
              metrics=['accuracy'])
history = model.fit(x_train, y_train, epochs=5, batch_size=1)

该模型编译正常（除非我使用现在注释掉的MaxPool2D，否则我得到的 “IndexError：元组索引超出范围”）。但是，当我想训练它时，我收到以下错误：

InvalidArgumentError                      Traceback (most recent call last)
C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\client\session.py in _do_call(self, fn, *args)
   1321     try:
-> 1322       return fn(*args)
   1323     except errors.OpError as e:

C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\client\session.py in _run_fn(feed_dict, fetch_list, target_list, options, run_metadata)
   1306       return self._call_tf_sessionrun(
-> 1307           options, feed_dict, fetch_list, target_list, run_metadata)
   1308 

C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\client\session.py in _call_tf_sessionrun(self, options, feed_dict, fetch_list, target_list, run_metadata)
   1408           self._session, options, feed_dict, fetch_list, target_list,
-> 1409           run_metadata)
   1410     else:

InvalidArgumentError: Matrix size-incompatible: In[0]: [1,16384], In[1]: [1024,10]
     [[Node: dense_251/MatMul = MatMul[T=DT_FLOAT, _class=["loc:@training_22/RMSprop/gradients/dense_251/MatMul_grad/MatMul"], transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](flatten_153/Reshape, dense_251/kernel/read)]]
     [[Node: loss_26/mul/_579 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_1108_loss_26/mul", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

有人可以帮我弄清楚我在做错了什么吗？

Answer 1

Keras / TensorFlow发出的异常消息（说实话）没有人们希望的那样有用。

您始终应检查的一件事是：我是否可以正确计算自定义图层的输出形状？您正在返回：

return (self.output_dim[0], self.output_dim[1], input_shape[2][3])

，但这完全忽略了您的数据将被批处理（因为形状仅具有3级）。您可以通过添加None作为元组的第一项来解决此问题：

return (None, self.output_dim[0], self.output_dim[1], input_shape[2][3])

在尝试发现真正的问题/解决您的问题时，我注意到，您所引用的代码也存在其他一些问题。我也修复了这些问题。您可以在this repository中找到重新实现的代码版本。

PS：您可能已经注意到了这个问题，而您已经找到了一个线索：

当我使用现在被注释掉的MaxPool2D时， “ IndexError：元组索引超出范围

此错误消息应该已经警告您，图层的输出形状可能不正确/不符合预期。

Keras：矩阵大小不兼容：在[0]：[1,16384]，在[1]中：[1024,10]

1 个答案: