Huggingface - TFDistilBertForSequenceClassification 预训练模型中的 Dropout 层

时间:2021-06-29 05:04:15

标签: tensorflow huggingface-transformers

TFDistilBertForSequenceClassification 预训练模型中 Dropout 层的目标是什么,为什么最后一层不是 softmax 或 sigmoid?

TFDistilBertForSequenceClassification 预训练模型最后有 Dropout 层。

model = TFDistilBertForSequenceClassification.from_pretrained('distilbert-base-uncased')
model.compile(
    optimizer=optimizer, 
    # loss=self.model.compute_loss,
    loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    metrics = ["accuracy"]  
)
model.summary()

Layer (type)                 Output Shape              Param #   
=================================================================
distilbert (TFDistilBertMain multiple                  66362880  
_________________________________________________________________
pre_classifier (Dense)       multiple                  590592    
_________________________________________________________________
classifier (Dense)           multiple                  1538      
_________________________________________________________________
dropout_99 (Dropout)         multiple                  0         <------------
=================================================================
Total params: 66,955,010
Trainable params: 592,130
Non-trainable params: 66,362,880

为了将输出输入分类或逻辑交叉熵损失层,我相信输出来自 softmax 或 sigmoid。然而,最后一层是 dropout。请帮助理解原因。

参考

Fine-tuning with native PyTorch/TensorFlow

from transformers import TFDistilBertForSequenceClassification

model = TFDistilBertForSequenceClassification.from_pretrained('distilbert-base-uncased')

optimizer = tf.keras.optimizers.Adam(learning_rate=5e-5)
model.compile(optimizer=optimizer, loss=model.compute_loss) # can also use any keras loss fn
model.fit(train_dataset.shuffle(1000).batch(16), epochs=3, batch_size=16)

0 个答案:

没有答案