TFDistilBertForSequenceClassification 预训练模型中 Dropout 层的目标是什么,为什么最后一层不是 softmax 或 sigmoid?
TFDistilBertForSequenceClassification 预训练模型最后有 Dropout 层。
model = TFDistilBertForSequenceClassification.from_pretrained('distilbert-base-uncased')
model.compile(
optimizer=optimizer,
# loss=self.model.compute_loss,
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics = ["accuracy"]
)
model.summary()
Layer (type) Output Shape Param #
=================================================================
distilbert (TFDistilBertMain multiple 66362880
_________________________________________________________________
pre_classifier (Dense) multiple 590592
_________________________________________________________________
classifier (Dense) multiple 1538
_________________________________________________________________
dropout_99 (Dropout) multiple 0 <------------
=================================================================
Total params: 66,955,010
Trainable params: 592,130
Non-trainable params: 66,362,880
为了将输出输入分类或逻辑交叉熵损失层,我相信输出来自 softmax 或 sigmoid。然而,最后一层是 dropout。请帮助理解原因。
Fine-tuning with native PyTorch/TensorFlow
from transformers import TFDistilBertForSequenceClassification
model = TFDistilBertForSequenceClassification.from_pretrained('distilbert-base-uncased')
optimizer = tf.keras.optimizers.Adam(learning_rate=5e-5)
model.compile(optimizer=optimizer, loss=model.compute_loss) # can also use any keras loss fn
model.fit(train_dataset.shuffle(1000).batch(16), epochs=3, batch_size=16)