Question

我正在尝试实现论文中提到的自定义注意力层 - 3-D Convolutional Recurrent Neural Networks with Attention Model for Speech Emotion Recognition 在 keras 中。该代码可在 Github repo 上获得。

这里的注意力层写成一个旧的tensorflow格式的函数-

`import tensorflow as tf


def attention(inputs, attention_size, time_major=False, return_alphas=False):
    """
    Attention mechanism layer which reduces RNN/Bi-RNN outputs with Attention vector.

    The idea was proposed in the article by Z. Yang et al., "Hierarchical Attention Networks
     for Document Classification", 2016: http://www.aclweb.org/anthology/N16-1174.
    Variables notation is also inherited from the article
    
    Args:
        inputs: The Attention inputs.
            Matches outputs of RNN/Bi-RNN layer (not final state):
                In case of RNN, this must be RNN outputs `Tensor`:
                    If time_major == False (default), this must be a tensor of shape:
                        `[batch_size, max_time, cell.output_size]`.
                    If time_major == True, this must be a tensor of shape:
                        `[max_time, batch_size, cell.output_size]`.
                In case of Bidirectional RNN, this must be a tuple (outputs_fw, outputs_bw) containing the forward and
                the backward RNN outputs `Tensor`.
                    If time_major == False (default),
                        outputs_fw is a `Tensor` shaped:
                        `[batch_size, max_time, cell_fw.output_size]`
                        and outputs_bw is a `Tensor` shaped:
                        `[batch_size, max_time, cell_bw.output_size]`.
                    If time_major == True,
                        outputs_fw is a `Tensor` shaped:
                        `[max_time, batch_size, cell_fw.output_size]`
                        and outputs_bw is a `Tensor` shaped:
                        `[max_time, batch_size, cell_bw.output_size]`.
        attention_size: Linear size of the Attention weights.
        time_major: The shape format of the `inputs` Tensors.
            If true, these `Tensors` must be shaped `[max_time, batch_size, depth]`.
            If false, these `Tensors` must be shaped `[batch_size, max_time, depth]`.
            Using `time_major = True` is a bit more efficient because it avoids
            transposes at the beginning and end of the RNN calculation.  However,
            most TensorFlow data is batch-major, so by default this function
            accepts input and emits output in batch-major form.
        return_alphas: Whether to return attention coefficients variable along with layer's output.
            Used for visualization purpose.
    Returns:
        The Attention output `Tensor`.
        In case of RNN, this will be a `Tensor` shaped:
            `[batch_size, cell.output_size]`.
        In case of Bidirectional RNN, this will be a `Tensor` shaped:
            `[batch_size, cell_fw.output_size + cell_bw.output_size]`.
    """

    if isinstance(inputs, tuple):
        # In case of Bi-RNN, concatenate the forward and the backward RNN outputs.
        inputs = tf.concat(inputs, 2)

    if time_major:
        # (T,B,D) => (B,T,D)
        inputs = tf.array_ops.transpose(inputs, [1, 0, 2])

    hidden_size = inputs.shape[2].value  # D value - hidden size of the RNN layer

    # Trainable parameters
    W_omega = tf.Variable(tf.random.normal([hidden_size, attention_size], stddev=0.1))
    b_omega = tf.Variable(tf.random.normal([attention_size], stddev=0.1))
    u_omega = tf.Variable(tf.random.normal([attention_size], stddev=0.1))

    # Applying fully connected layer with non-linear activation to each of the B*T timestamps;
    #  the shape of `v` is (B,T,D)*(D,A)=(B,T,A), where A=attention_size
    #v = tf.tanh(tf.tensordot(inputs, W_omega, axes=1) + b_omega)
    v = tf.sigmoid(tf.tensordot(inputs, W_omega, axes=1) + b_omega)
    # For each of the timestamps its vector of size A from `v` is reduced with `u` vector
    vu = tf.tensordot(v, u_omega, axes=1)   # (B,T) shape
    alphas = tf.nn.softmax(vu)              # (B,T) shape also

    # Output of (Bi-)RNN is reduced with attention vector; the result has (B,D) shape
    output = tf.reduce_sum(input_tensor=inputs * tf.expand_dims(alphas, -1), axis=1)

    if not return_alphas:
        return output
    else:
        return output, alphas`

我正在尝试将其实现为自定义层

import tensorflow as tf
import keras.backend as K
from keras.engine.topology import Layer


class att_Layer1D(Layer):

    def __init__(self, attention_dim, **kwargs):
        self.attention_dim = attention_dim
        super(att_Layer1D, self).__init__(**kwargs)

    def build(self, input_shape):
        print(len(input_shape))
        # Create a trainable weight variable for this layer.
        assert len(input_shape) >= 3
        input_dim = input_shape[1:]
        print(input_shape)


        self.kernel1 = self.add_weight(shape=(input_dim[1],self.attention_dim),
                                       name = 'kernel1',
                                      initializer='uniform',
                                      trainable=True)
        print(self.kernel1)
        self.b_omega = self.add_weight(shape=(input_shape[0],self.attention_dim),
        #                                name = 'kernel2',
        #                               initializer='uniform',
        #                               trainable=True)
        # print(self.b_omega)
        self.u_omega = self.add_weight(shape=(input_shape[0],self.attention_dim),
        #                                name = 'kernel3',
        #                               initializer='uniform',
        #                               trainable=True)
        # print(self.u_omega)
        # print(self.kernel1)
        super(att_Layer1D, self).build(input_shape)  # Be sure to call this at the end

    def call(self, x):
        print(x.shape)
        input_shape=K.int_shape(x)
  
        
        v= K.sigmoid(K.dot(x,self.kernel1)+self.b_omega)
        vu = K.dot(v, self.u_omega)
        alphas = K.softmax(vu)
        output = tf.reduce_sum(x * K.expand_dims(alphas, -1), 1)
        return output

    def compute_output_shape(self, input_shape):
#        return (input_shape[0], self.output_dim[0],self.output_dim[1])
        return (input_shape[0],input_shape[2])
    
    def get_config(self):
        config = {
            'attention_dim': self.attention_dim,
                    }
        base_config = super(att_Layer1D, self).get_config()
        return dict(list(base_config.items()) + list(config.items()))

但是，我在尝试添加权重 u_omega 和 b_omega 时遇到了问题，因为它无法初始化具有未知批次维度的权重。

是否有任何解决方法。任何帮助将不胜感激。

在 keras 中使用 None 维度的自定义层中添加可训练的权重

0 个答案: