将训练有素的BERT模型应用于预测部署

时间:2020-06-25 11:33:36

标签: python multilabel-classification bert-language-model

我目前正在为文本数据执行多标签分类任务。 我有一个带有ID列,文本列和几列的数据框,这些列是仅包含1或0的文本标签。

我使用了该网站Kaggle Toxic Comment Classification using Bert上提出的现有解决方案,该解决方案允许以百分比表示其对每个标签的归属程度。

现在,我已经训练了我的模型,我希望在不带标签的少量文本提取中对其进行测试,以获得属于每个标签的百分比:

我已经尝试过此解决方案:

def getPrediction(in_sentences):
  label = ['S1, S2, S3']
  input_examples = [run_classifier.InputExample(guid="", text_a = x, text_b = None, label=label) for x in in_sentences]
  input_features = run_classifier.convert_examples_to_features(input_examples, LABEL_COLUMNS, MAX_SEQ_LENGTH, tokenizer)
  predict_input_fn = run_classifier.input_fn_builder(features=input_features, seq_length=MAX_SEQ_LENGTH, is_training=False, drop_remainder=False)
  predictions = estimator.predict(predict_input_fn)
  return [(sentence, prediction['probabilities'], labels[prediction['labels']]) for sentence, prediction in zip(in_sentences, predictions)]

pred_sentences = [
  "here is an exemple of sentence"]

pred_sentences = ''.join(pred_sentences)

predictions = getPrediction(pred_sentences)

我得到了:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-490-770bf0871d3e> in <module>
----> 1 predictions = getPrediction(pred_sentences)

<ipython-input-486-3de7328d60db> in getPrediction(in_sentences)
      2   label = ['S1','S2',
      3    'S3']
----> 4   input_examples = [run_classifier.InputExample(guid="", text_a = x, text_b = None, labels=label) for x in in_sentences]
      5   input_features = run_classifier.convert_examples_to_features(input_examples, LABEL_COLUMNS, MAX_SEQ_LENGTH, tokenizer)
      6   predict_input_fn = run_classifier.input_fn_builder(features=input_features, seq_length=MAX_SEQ_LENGTH, is_training=False, drop_remainder=False)

<ipython-input-486-3de7328d60db> in <listcomp>(.0)
      2   label = ['S1,
      3    S2,S3']
----> 4   input_examples = [run_classifier.InputExample(guid="", text_a = x, text_b = None, labels=label) for x in in_sentences]
      5   input_features = run_classifier.convert_examples_to_features(input_examples, LABEL_COLUMNS, MAX_SEQ_LENGTH, tokenizer)
      6   predict_input_fn = run_classifier.input_fn_builder(features=input_features, seq_length=MAX_SEQ_LENGTH, is_training=False, drop_remainder=False)

TypeError: __init__() got an unexpected keyword argument 'labels'

您知道需要进行哪些更改才能使算法的最后一部分正常工作吗?

1 个答案:

答案 0 :(得分:0)

您输入了错字,InputExample需要一个名为label而不是labels的关键字参数:

[run_classifier.InputExample(guid="", text_a = x, text_b = None, labels=label) for x in in_sentences]
                                                                      ^