Question

我想创建一个自定义注意层，以便随时输入该层，并返回所有时间输入的输入加权平均值。

例如，我希望形状为[32,100,2048]的输入张量进入图层，而我得到形状为[32,100,2048]的张量。我写的图层如下：

import tensorflow as tf

from keras.layers import Layer, Dense

#or

from tensorflow.keras.layers import Layer, Dense


class Attention(Layer):

  def __init__(self, units_att):

     self.units_att = units_att
     self.W = Dense(units_att)
     self.V = Dense(1)
     super().__init__()

  def __call__(self, values):

      t = tf.constant(0, dtype= tf.int32)    
      time_steps = tf.shape(values)[1]
      initial_outputs = tf.TensorArray(dtype=tf.float32, size=time_steps)
      initial_att =  tf.TensorArray(dtype=tf.float32, size=time_steps)

      def should_continue(t, *args):
          return t < time_steps

      def iteration(t, values, outputs, atts):

        score = self.V(tf.nn.tanh(self.W(values)))

        # attention_weights shape == (batch_size, time_step, 1)
        attention_weights = tf.nn.softmax(score, axis=1)

        # context_vector shape after sum == (batch_size, hidden_size)
        context_vector = attention_weights * values
        context_vector = tf.reduce_sum(context_vector, axis=1)

        outputs = outputs.write(t, context_vector)
        atts = atts.write(t, attention_weights)
        return t + 1, values, outputs, atts

      t, values, outputs, atts = tf.while_loop(should_continue, iteration,
                                  [t, values, initial_outputs, initial_att])

      outputs = outputs.stack()
      outputs = tf.transpose(outputs, [1,0,2])

      atts = atts.stack()
      atts = tf.squeeze(atts, -1)
      atts = tf.transpose(atts, [1,0,2])
      return t, values, outputs, atts

对于input= tf.constant(2, shape= [32, 100, 2048], dtype= tf.float32)，我得到了在tf2中输出为shape = [32,100,2048]，在tf1中输出为[32,None, 2048]。

对于输入input= Input(shape= (None, 2048))，我在tf1中得到了shape = [None, None, 2048]的输出，并且出现了错误

TypeError：“ Tensor”对象无法解释为整数

在tf2中。

最后，在这两种情况下，我都无法在模型中使用该层，因为我的模型输入是Input(shape= (None, 2048))并收到错误

AttributeError：“ NoneType”对象没有属性“ _inbound_nodes”

在tf1和tf2中，我得到了与上述相同的错误，我使用Keras功能方法创建了模型。

Answer 1

在您共享的代码中，您似乎想要在代码中实现Bahdanau的关注层。您要关注所有“值”（上一层输出-所有隐藏状态），而“查询”将是解码器的最后一个隐藏状态。您的代码实际上应该非常简单，并且应类似于：

        class Bahdanau(tf.keras.layers.Layer):
            def __init__(self, n):
                super(Bahdanau, self).__init__()
                self.w = tf.keras.layers.Dense(n)
                self.u = tf.keras.layers.Dense(n)
                self.v = tf.keras.layers.Dense(1)
        
            def call(self, query, values):
                query = tf.expand_dims(query, 1)
                e = self.v(tf.nn.tanh(self.w(query) + self.u(values)))
                a = tf.nn.softmax(e, axis=1)
                c = a * h
                c = tf.reduce_sum(c, axis=1)
                return a,c
        
        ##Say we want 10 units in the single layer MLP determining w,u
        attentionlayer = Bahdanau(10)
        ##Call with i/p: decoderstate @ t-1 and all encoder hidden states
        a, c = attentionlayer(stminus1, hj)

我们没有在代码中的任何地方指定张量形状。此代码将为您返回一个与“ stminus1”（即“查询”）大小相同的上下文张量。它是在使用Bahdanau的注意力机制处理所有“值”（解码器的所有输出状态）之后执行此操作的。

因此，假设您的批处理大小为32，时间步长= 100，嵌入维度= 2048，则stminus1的形状应为（32,2048），hj的形状应为（32,100,2048）。输出上下文的形状将为（32,2048）。我们还返回了100个注意权重，以防万一您希望将它们路由到一个漂亮的显示器上。

这是“注意”的最简单版本。如果您还有其他意图，请告诉我，我将重新格式化我的答案。有关更多详细信息，请参阅https://towardsdatascience.com/create-your-own-custom-attention-layer-understand-all-flavours-2201b5e8be9e

在Keras中使用自定义注意层

1 个答案: