我想在LSTM的Embeddings层中使用BERT Word Vector Embeddings,而不是通常的默认嵌入层。我有什么办法吗?
答案 0 :(得分:1)
希望这些链接对您有所帮助:
具有Tf2.0的Huggingface变形金刚(TPU培训),并嵌入:(https://www.kaggle.com/abhilash1910/nlp-workshop-2-ml-india)
具有BERT嵌入的上下文相似性(Pytorch): https://github.com/abhilash1910/BERTSimilarity
要使用BERT / BERT变体生成唯一的句子嵌入,建议选择正确的层。在某些情况下,可以使用以下模式来确定嵌入(TF 2.0 / Keras):
transformer_model = transformers.TFBertModel.from_pretrained('bert-large-uncased')
input_ids = tf.keras.layers.Input(shape=(128,), name='input_token', dtype='int32')
input_masks_ids = tf.keras.layers.Input(shape=(128,), name='masked_token', dtype='int32')
X = transformer_model(input_ids, input_masks_ids)[0]
X = tf.keras.layers.Dropout(0.2)(X)
X = tf.keras.layers.Dense(6, activation='softmax')
model = tf.keras.Model(inputs=[input_ids, input_masks_ids], outputs = X)
from transformers import AutoTokenizer, pipeline, TFBertModel
bert_features1=transformer_embedding('bert-base-uncased',text1,TFBertModel)
bert_features2=transformer_embedding('bert-base-uncased',text2,TFBertModel)
distance=1-cosine(bert_features1[0],bert_features2[0])
print(distance)
谢谢。