Question

嗨，我知道stanfordNERenglish.muc.7class.distsim.crf.ser.gz有助于对7个类进行分类：位置，人员，组织，金钱，百分比，日期，时间但我想在7类中对文本进行分类但是说人的全名，钱，日期，时间，地点，学位等...请让我如何定制模型nlp库斯坦福nlp / gate / open nlp

Answer 1

好吧，如果您按照此documentation中的说明使用opennlp，请创建您的培训数据：

<START:person> Pierre Vinken <END> , 61 years old , will join the board as a nonexecutive director Nov. 29 .
Mr . <START:person> Vinken <END> is chairman of Elsevier N.V. , the Dutch publishing group .

这些标记是您要为要查找的所有不同实体添加的标记。并使用文档中提供的培训API或CLI制作模型。

另外，如果你的训练集大约有15000行，那么你可以期待很好的结果！

Answer 2

在 OpenNLP 中，您可以使用以下步骤创建自定义NER模型。

首先，您需要以给定格式<START:entity-name> .....<END>训练您的数据。让我们说你想创建医学NER模型。所以它会是这样的：

示例：

<START:medicine> Augmentin-Duo <END> is a penicillin antibiotic that contains two medicines -   
<START:medicine> amoxicillin trihydrate <END> and <START:medicine> potassium clavulanate <END>. They work together to kill certain types of bacteria and are used to treat certain types of bacterial infections

训练数据应至少有15000个句子才能获得更好的效果。

使用 TokenNameFinderModel 类，使用所需的型号名称，数据文件路径调用。

您可以使用命令行创建一个这样的：

$opennlp TokenNameFinderTrainer -model en-ner-drugs.bin -lang en -data drugsDetails.txt -encoding UTF-8

要使用java执行相同操作，您可以参考以下文章：Writing a custom NameFinder model in OpenNLP。

如何使用nlp定制NER模型

2 个答案: