应用多个 LSTM 自注意力层

时间:2021-01-30 08:55:28

标签: deep-learning transformer attention-model

它是一个二元分类器,数据集为 (4917,50,136) as (batch, step, features)。

然而,数据集是不平衡的,0 类包含 1401 个数据,其余数据属于 1 类。此外,1 类实际上由其他 7 个类组成。

应用下面的 LSTM self-attention 模型只获得了 0.66 的验证准确率,所以我希望通过应用多个 self-attention 来改进。我猜这就像一个变压器。

我怎样才能实现它?

def attention_model(X_train, y_train, X_test, y_test,num_classes,dropout=0.2, batch_size=68, learning_rate=0.0001,epochs=20,optimizer='Adam'):
    
    Dense_unit = 9
    LSTM_unit = 9
    
    dense_reg = 0.01
    attention_param = LSTM_unit*2
    attention_init_value = 1.0/attention_param
    
    
    u_train = np.full((X_train.shape[0], attention_param),
                      attention_init_value, dtype=np.float32)
    u_test = np.full((X_test.shape[0],attention_param),
                     attention_init_value, dtype=np.float32)
    
    
    with keras.backend.name_scope('BLSTMLayer'):
        # Bi-directional Long Short-Term Memory for learning the temporal aggregation
        input_feature = Input(shape=(X_train.shape[1],X_train.shape[2]))
        x = Masking(mask_value=0)(input_feature)
        x = Dense(Dense_unit,kernel_regularizer=l2(dense_reg), activation='relu')(x)
        x = Dropout(dropout)(x)
        x = Dense(Dense_unit,kernel_regularizer=l2(dense_reg),activation='relu')(x)
        x = Dropout(dropout)(x)
        x = Dense(Dense_unit,kernel_regularizer=l2(dense_reg),activation='relu')(x)
        x = Dropout(dropout)(x)
        x = Dense(Dense_unit,kernel_regularizer=l2(dense_reg), activation='relu')(x)
        x = Dropout(dropout)(x)


        y = Bidirectional(LSTM(LSTM_unit,activity_regularizer=l2(0.0029),kernel_regularizer=l2(0.002),recurrent_regularizer=l2(0.002),return_sequences=True, dropout=dropout))(x)


    with keras.backend.name_scope('AttentionLayer'):
        # Logistic regression for learning the attention parameters with a standalone feature as input
        input_attention = Input(shape=(LSTM_unit * 2,))
        u = Dense(LSTM_unit * 2, activation='softmax')(input_attention)

        # To compute the final weights for the frames which sum to unity
        alpha = dot([u, y], axes=-1)  # inner prod.
        alpha = Activation('softmax')(alpha)

    with keras.backend.name_scope('WeightedPooling'):
        # Weighted pooling to get the utterance-level representation
        z = dot([alpha, y], axes=1)

    # Get posterior probability for each emotional class
    output = Dense(num_classes, activation='softmax')(z)

    model = Model(inputs=[input_attention, input_feature], outputs=output)

0 个答案:

没有答案