Question

我正在使用Python BERT模型：https://github.com/google-research/bert

我的目标是建立一个二进制分类模型，以预测新闻标题是否与特定类别相关。我有一组训练有素的数据，其中包含新闻标题语句以及二进制值，以指示标题有效还是无效。

我试图运行run_classifier.py脚本，但获得的结果似乎没有意义。测试结果文件有两列，每行重复相同的两个数字：

在task_name的模型参数中，我也将其设置为：cola，在阅读了BERT https://arxiv.org/pdf/1810.04805.pdf的学术论文后，我觉得这似乎不合适。该论文在第14和15页上列出了其他几个任务，但似乎都不适合根据内容对句子进行二进制分类。

如何正确使用BERT对句子进行分类？我厌倦了使用本指南https://appliedmachinelearning.blog/2019/03/04/state-of-the-art-text-classification-using-bert-model-predict-the-happiness-hackerearth-challenge/ 但这并没有产生我所期望的结果。

Answer 1

对于二进制分类任务（假设您使用过cola处理器），BERT对测试集的预测将保存到test_results.tsv文件中。

为了解释test_results.tsv，您必须知道其结构。

文件包含的行数等于测试集中的输入数。列数将等于测试标签的数量。（由于您的任务是二进制分类，因此将有两列，标签0的列和标签1的列。）

每列中的值是softmax值（给定行的所有列的值之和必须等于1），指示给定类（或标签）的概率。

如果按照您的情况进行观察，则0.9999991和9.12E-6（9.12 * 10 ^（-6））是不同的。如果将它们相加，它们等于〜1。（这也可以解释为测试输入属于标签0指示的类）

Answer 2

How can I properly use BERT to classify sentences?

看看这个完整的working code for sentence classification, using IMDB Sentiment Analysis (Binary text classification on Google Colab using GPU)

基本上，您可以使用Tensorflow和keras-bert来做到这一点。涉及的步骤是

加载并转换您的自定义数据。
加载预训练的模型并定义网络以进行微调
使用自定义数据训练/微调模型。
使用训练有素的模型进行分类。

以下是简短的代码片段。

model = load_trained_model_from_checkpoint(
      config_path,
      checkpoint_path,
      training=True,
      trainable=True,
      seq_len=SEQ_LEN,
  )
inputs = model.inputs[:2]
dense = model.get_layer('NSP-Dense').output
outputs = keras.layers.Dense(units=2, activation='softmax')(dense)
model = keras.models.Model(inputs, outputs)

model.compile(
      RAdam(lr=LR),
      loss='sparse_categorical_crossentropy',
      metrics=['sparse_categorical_accuracy'],
)
history = model.fit(
    train_x,
    train_y,
    epochs=EPOCHS,
    batch_size=BATCH_SIZE,
    validation_split=0.20,
    shuffle=True,
)

predicts = model.predict(test_x, verbose=True).argmax(axis=-1)
texts = [
  "It's a must watch",
  "Can't wait for it's next part!",
  'It fell short of expectations.',
]
for text in texts:
  ids, segments = tokenizer.encode(text, max_len=SEQ_LEN)
  inpu = np.array(ids).reshape([1, SEQ_LEN])
  predicted_id = model.predict([inpu,np.zeros_like(inpu)]).argmax(axis=-1)[0]
  print ("%s: %s"% (id_to_labels[predicted_id], text))

输出：

positive: It's a must watch
positive: Can't wait for it's next part!
negative: It fell short of expectations.

希望有帮助。

BERT句子分类，Python

2 个答案: