训练spacy-transformers文本分类器-尝试一个简单的训练示例

时间:2020-06-17 12:05:14

标签: spacy text-classification spacy-pytorch-transformers

在按照这些示例所示的路线进行了大量实验后:

https://colab.research.google.com/github/explosion/spacy-pytorch-transformers/blob/master/examples/Spacy_Transformers_Demo.ipynb

https://github.com/explosion/spacy-transformers/blob/master/examples/train_textcat.py

我发现在spacy-transformer模型上调用nlp.update时无法观察到任何学习的效果。我已经尝试过使用en_trf_bertbaseuncased_lg,如下所示,并使用en_trf_distilbertbaseuncased_lg模型没有运气。我可以通过spacy TextCategorizer和LSTM示例进行文本分类。

因此,我想问一下我如何做才能修改以下代码,以在为该测试语句调用doc.cats时使“ THE_POSITIVE_LABEL”的得分小于1.0。它当前运行无错误,但总返回1.0。在运行适当的训练集并观看相同的P,R,F分数后,我尝试了该示例,该损失值仅在每次评估时都跳了一下。经过更正的版本可以用作对该功能的简单测试。

import spacy
from collections import Counter

nlp = spacy.load('en_trf_bertbaseuncased_lg')

textcat = nlp.create_pipe(
      "trf_textcat",
    config={
        "architecture": "softmax_class_vector", # have also tried "softmax_last_hidden" with "words_per_batch" like in one of the examples
        'token_vector_width': 768  # added as otherwise it complains about textcat config not having 'token_vector_width'
    }
)

textcat.add_label("THE_POSITIVE_LABEL")

nlp.add_pipe(textcat, last=True)

nlp.begin_training() # added as otherwise it says trf_textcat has no model when we call doc.cats

print(nlp("an example of a document that does not  match the label").cats)

#{'THE_POSITIVE_LABEL': 1.0} is printed

optimizer = nlp.resume_training()
optimizer.alpha = 0.001
optimizer.trf_weight_decay = 0.005
optimizer.L2 = 0.0
optimizer.trf_lr = 2e-5

losses = Counter()

texts = ['an example of a document that does not  match the label',]

annotations = [{'THE_POSITIVE_LABEL': 0.},]

nlp.update(texts, annotations, sgd=optimizer, drop=0.1, losses=losses)

print(nlp("an example of a document that does not  match the label").cats)

#{'THE_POSITIVE_LABEL': 1.0} is again printed

0 个答案:

没有答案