Question

我发现很难创建自己的模型openNLP。任何人都可以告诉我，如何拥有模型。如何进行训练。

输入应该是什么以及输出模型文件的存储位置。

Answer 1

https://opennlp.apache.org/docs/1.5.3/manual/opennlp.html

这个网站非常有用，在代码中显示，并使用OpenNLP应用程序训练所有不同类型的模型，如实体提取和词性等。

我可以在这里给你一些代码示例，但页面使用非常清楚。

理论明智：

基本上你创建了一个列出你要训练的东西的文件

例如

体育[whitespace]这是一个关于足球，橄榄球和其他东西的页面

政治[空白]这是关于托尼布莱尔担任总理的一页。

上面的页面描述了格式（每个模型需要不同的格式）。创建此文件后，可以通过API或opennlp应用程序（通过命令行）运行它，并生成.bin文件。获得此.bin文件后，可以将其加载到模型中，然后开始使用它（根据上述网站中的api）。

Answer 2

首先，您需要使用所需的实体来训练数据。

句子应该用换行符（\ n）分隔。值应与空格字符分隔并标记假设您想创建医学实体模型，因此数据应该是这样的：

<START:medicine> Augmentin-Duo <END> is a penicillin antibiotic that contains two medicines - <START:medicine> amoxicillin trihydrate <END> and 
<START:medicine> potassium clavulanate <END>. They work together to kill certain types of bacteria and are used to treat certain types of bacterial infections.

例如，您可以引用示例dataset。训练数据应该至少有15000个句子才能获得更好的结果。

此外，您可以使用Opennlp TokenNameFinderTrainer。输出文件将采用.bin格式。

以下是示例：Writing a custom NameFinder model in OpenNLP

有关详细信息，请参阅Opennlp documentation

Answer 3

也许这篇文章会帮助你。它描述了如何从维基百科提取的数据中进行 TokenNameFinder 培训......

nuxeo - blog - Mining Wikipedia with Hadoop and Pig for Natural Language Processing

Answer 4

复制数据中的数据并在代码下面运行以获得您自己的mymodel.bin。

可以参考data = https://github.com/mccraigmccraig/opennlp/blob/master/src/test/resources/opennlp/tools/namefind/AnnotatedSentencesWithTypes.txt

public class Training {
       static String onlpModelPath = "mymodel.bin";
       // training data set
       static String trainingDataFilePath = "data.txt";

       public static void main(String[] args) throws IOException {
                       Charset charset = Charset.forName("UTF-8");
                       ObjectStream<String> lineStream = new PlainTextByLineStream(
                                                       new FileInputStream(trainingDataFilePath), charset);
                       ObjectStream<NameSample> sampleStream = new NameSampleDataStream(
                                                       lineStream);
                       TokenNameFinderModel model = null;
                       HashMap<String, Object> mp = new HashMap<String, Object>();
                       try {
                              //         model = NameFinderME.train("en","drugs", sampleStream, Collections.<String,Object>emptyMap(),100,4) ;
                                       model=  NameFinderME.train("en", "drugs", sampleStream, Collections. emptyMap());
                       } finally {
                                       sampleStream.close();
                       }
                       BufferedOutputStream modelOut = null;
                       try {
                                       modelOut = new BufferedOutputStream(new FileOutputStream(onlpModelPath));
                                       model.serialize(modelOut);
                       } finally {
                                       if (modelOut != null)
                                                       modelOut.close();
                       }
       }
}

在opennlp中训练自己的模型

4 个答案: