Question

我正在使用 NER 制作 Bi-LSTM 模型。我想使用 Attention 层。我想适合该 Attention 层的正确方法是什么？给出了两个层：tf.keras.layers.Attention 和 tf.keras.layers.AdditiveAttention。我认为它使用了 LSTM 的所有隐藏状态以及最后的输出，但我不太确定。下面是代码。请告诉我该把那个 Attention 层放在哪里？文档对我没有帮助。所有其他答案都使用了自己的 CustomAttention() 层。

def build_model(vocab_size:int,n_tags:int,max_len:int,emb_dim:int=300,emb_weights=False,use_elmo:bool=False,use_crf:bool=False,train_embedding:bool=False):
    '''
    Build and return a Keras model based on the given inputs
    args:
        n_tags: No of unique 'y' tags present in the data
        max_len: Maximum length of sentence to use
        emb_dim: Size of embedding dimension
        emb_weights: pretrained Embedding Weights for Embedding Layer. if False, use default
        use_elmo: Whether to use Elmo Embeddings
        use_crf: Whether to use the CRF layer
        train_embedding: Whether to train the embeddings weights
    out:
        Keras model. See comments for each type of loss function and metric to use
    '''
    assert not(isinstance(emb_weights,np.ndarray) and  use_elmo), "Either provide embedding weights or use ELMO. Not both"
    
    inputs = Input(shape=(max_len,))
    
    if isinstance(emb_weights,np.ndarray):
        x = Embedding(trainable=train_embedding,input_dim=vocab_size, output_dim=emb_dim, input_length=max_len, mask_zero=True, embeddings_initializer=keras.initializers.Constant(emb_weights))(inputs)
    elif use_elmo:
        x = Lambda(ElmoEmbedding, output_shape=(max_len, 1024))(inputs) # Lambda will create a layer based on the function defined  
    else: # use default Embeddings
        x = Embedding(input_dim=vocab_size, output_dim=emb_dim, input_length=max_len, mask_zero=True,)(inputs) # n_words = vocab_size
    
    x = Bidirectional(LSTM(units=50, return_sequences=True,recurrent_dropout=0.1))(x)
# I think the attention layer will come here but I'm not sure exactly how to implement it here.
    
    if use_crf: 
        try: # If you can not modify your crf.py file, it'll use the second package 
            x = Dense(50, activation="relu")(x) # use TimeDistributed(Dense(50, activation="relu")(x)) in case otherwise
            crf = CRF(n_tags) # Instantiate CRF layer
            out = crf(x) 
            model = Model(inputs, out)
            return model # use crf_loss and crf_accuracy at compile time
        
        except:
            output = Dense(n_tags, activation=None)(x)
            crf = CRF_TF2(dtype='float32') # it does not take any n_tags. See the documentation.
            output = crf(output)
            base_model = Model(inputs, output)
            model = ModelWithCRFLoss(base_model) # It has Loss and Metric already. Change the model if you want to use DiceLoss.
            return model # Do not use any metric or loss with this model.compile(). Just use Optimizer and run training

    else:
        out = Dense(n_tags, activation="softmax")(x) # Wrap it around TimeDistributed(Dense()) if you have old versions
        model = Model(inputs, out)
        return model # use "sparse_categorical_crossentropy", "accuracy"

如何使用在 tensorflow (Keras) 中给出的注意力或添加注意力层来完成 NER 任务？

0 个答案: