100个培训示例足以使用spacy训练自定义NER吗?

时间:2019-05-27 17:22:30

标签: python-3.x machine-learning nlp spacy

我已经为姓名数据训练了NER模型。我生成了一些包含人名的随机句子。我生成了大约70个句子,并以spacy的格式注释了数据。

我同时使用空白的'en'模型和'en_core_web_sm'训练了自定义NER,但是当我在任何字符串上进行测试时。它能够在很少的示例中进行检测。

这样的例子是否不够?

My data looks like this -:

[("'Hi, I am looking for a house on rent for a year. Best Regards, Rajesh',\r",
  {'entities': [(56, 63, 'name')]}),
 ("'Hello everyone, I am Gunjan Arora',\r", {'entities': [(22, 34, 'name')]}),
 ("'Greetings!, I am 34 years old. I want a car for my wife Bella Roy',\r",
  {'entities': [(60, 69, 'name')]}),
 ("'Heyo, I lived with my family comprises 4 people and myself Randy Lao',\r",
  {'entities': [(60, 69, 'name')]}),
 ("'I am Geetanjali. ',\r", {'entities': [(6, 16, 'name')]})]

I have generated some 70 examples like this.

Losses during training -:

 - 1.Losses {'ner': 6.307317615201415} 
 - 2.Losses {'ner': 11.182436657139132}
 - 3.Losses {'ner': 6.014345924849759}
 - 4.Losses {'ner': 6.442589285506237}
 - 5.Losses {'ner': 5.328383899880891}
 - 6.Losses {'ner': 1.706726450400089}
 - 7.Losses {'ner': 3.9960324752880005}
 - 8.Losses {'ner': 5.415169572852782}

These losses when I am using blank 'en' model

请提出建议。

我想检测名称,因为经过预训练的模型本身在大多数情况下也无法检测名称。

1 个答案:

答案 0 :(得分:0)

为获得更好的结果,您将需要生成更多示例,虽然可以解决一个不复杂的问题,但不能训练70个示例来训练您的模型。 我建议您将生成的示例增加三倍,以使其更加合适

相关问题