Question

我正在尝试编写一个图层，将2个张量与formula

合并

x [0]和x [1]的形状都是（？，1,500）。

M是500 * 500矩阵。

我希望输出为（？，500,500），这在我看来理论上是可行的。该层将为每对输入输出（1,500,500），如（1,1,500）和（1,1,500）。由于batch_size是可变的或动态的，因此输出必须是（？，500,500）。

然而，我对轴知之甚少，而且我已经尝试了所有轴的组合，但它没有意义。

我尝试使用numpy.tensordot和keras.backend.batch_dot（TensorFlow）。如果batch_size是固定的，则取a = （100,1,500）例如batch_dot（a，M，（2,0）），输出可以是（100,1,500）。

Keras的新手，对于这样一个愚蠢的问题感到抱歉，但我花了两天的时间弄明白，这让我发疯了：（

    def call(self,x):
            input1 = x[0]
            input2 = x[1]
            #self.M is defined in build function
            output = K.batch_dot(...)
            return output

更新

抱歉迟到了。我尝试用TensorFlow作为Keras后端的Daniel的回答，它仍然为不等的维度引发了ValueError。

我尝试使用Theano作为后端的相同代码，现在它可以正常工作。

>>> import numpy as np
>>> import keras.backend as K
Using Theano backend.
>>> from keras.layers import Input
>>> x1 = Input(shape=[1,500,])
>>> M = K.variable(np.ones([1,500,500]))
>>> firstMul = K.batch_dot(x1, M, axes=[1,2])

我不知道如何在theano中打印张量的形状。对我来说，这肯定比tensorflow更难......但它确实有效。

为此，我扫描了Tensorflow和Theano的2个版本的代码。以下是不同之处。

在这种情况下，x =（？，1,500），y =（1,500,500），轴= [1,2]

在tensorflow_backend中：

return tf.matmul(x, y, adjoint_a=True, adjoint_b=True)

在theano_backend：

return T.batched_tensordot(x, y, axes=axes)

（如果以下更改out._keras_shape不会影响out的值。）

Answer 1

您的乘法应该选择它在批处理点函数中使用的轴。

轴0 - 批量维度，它是您的?
轴1 - 您说的尺寸长度为1
轴2 - 最后一个尺寸为500

您不会更改批量维度，因此您将始终使用batch_dot轴= [1,2]

但为了工作，你必须调整M为（？，500,500）为此，将M定义为（500,500），而不是（1,500,500），并在第一个轴上重复批量大小：

import keras.backend as K

#Being M with shape (1,500,500), we repeat it.   
BatchM = K.repeat_elements(x=M,rep=batch_size,axis=0)
#Not sure if repeating is really necessary, leaving M as (1,500,500) gives the same output shape at the end, but I haven't checked actual numbers for correctness, I believe it's totally ok. 

#Now we can use batch dot properly:
firstMul = K.batch_dot(x[0], BatchM, axes=[1,2]) #will result in (?,500,500)

#we also need to transpose x[1]:
x1T = K.permute_dimensions(x[1],(0,2,1))

#and the second multiplication:
result = K.batch_dot(firstMul, x1T, axes=[1,2])

Answer 2

我更喜欢使用TensorFlow，所以我试着在过去几天用TensorFlow弄明白。

第一个与Daniel的解决方案非常相似。

x = tf.placeholder('float32',shape=(None,1,3))
M = tf.placeholder('float32',shape=(None,3,3))
tf.matmul(x, M)
# return: <tf.Tensor 'MatMul_22:0' shape=(?, 1, 3) dtype=float32>

需要使用合适的形状向M提供值。

sess = tf.Session()
sess.run(tf.matmul(x,M), feed_dict = {x: [[[1,2,3]]], M: [[[1,2,3],[0,1,0],[0,0,1]]]})
# return : array([[[ 1.,  4.,  6.]]], dtype=float32)

使用tf.einsum的另一种方法很简单。

x = tf.placeholder('float32',shape=(None,1,3))
M = tf.placeholder('float32',shape=(3,3))
tf.einsum('ijk,lm->ikl', x, M)
# return: <tf.Tensor 'MatMul_22:0' shape=(?, 1, 3) dtype=float32>

让我们提供一些价值观。

sess.run(tf.einsum('ijk,kl->ijl', x, M), feed_dict = {x: [[[1,2,3]]], M: [[1,2,3],[0,1,0],[0,0,1]]})
# return: array([[[ 1.,  4.,  6.]]], dtype=float32)

现在M是2D张量，无需将batch_size提供给M.

更重要的是，现在似乎可以使用tf.einsum在TensorFlow中解决这样的问题。这是否意味着Keras在某些情况下有责任调用tf.einsum？至少我找不到Keras叫tf.einsum的地方。在我看来，当batch_dot 3D张量和2D张量Keras表现得很奇怪时。在Daniel的回答中，他将M填充到（1,500,500）但在K.batch_dot（）中M将自动调整为（500,500,1）。我发现tf会用Broadcasting规则进行调整，我不确定Keras会做同样的事情。

在Keras中具有可变批量大小的batch_dot

2 个答案: