Question

我尝试使用openNlp训练NER的自定义模型。当我传递一个句子预测实体时，它只选择句子的第一个单词。不知道我哪里错了，。

请在下面找到培训模型代码，

public class OpenNLPNER {
    public static void main(String[] args) {
        train("en", "technology", "D:\\dl4j-examples-master\\dl4j-examples-master\\dl4j-examples\\src\\main\\java\\opennlpExamples\\src\\main\\resources\\technology.train", "D:\\dl4j-examples-master\\dl4j-examples-master\\dl4j-examples\\src\\main\\java\\opennlpExamples\\src\\main\\techno1.bin");
    }

    public static String train(String lang, String entity, InputStreamFactory inputStream, FileOutputStream modelStream) {

        Charset charset = Charset.forName("UTF-8");
        TokenNameFinderModel model = null;
        ObjectStream<NameSample> sampleStream = null;
        try {
            ObjectStream<String> lineStream = new PlainTextByLineStream(inputStream, charset);
            sampleStream = new NameSampleDataStream(lineStream);
            TokenNameFinderFactory nameFinderFactory = new TokenNameFinderFactory();
            model = NameFinderME.train("en", "technology", sampleStream, TrainingParameters.defaultParams(),
                nameFinderFactory);
        } catch (FileNotFoundException fio) {

        } catch (IOException io) {

        } finally {
            try {
                sampleStream.close();
            } catch (IOException io) {

            }
        }
        BufferedOutputStream modelOut = null;
        try {
            modelOut = new BufferedOutputStream(modelStream);
            model.serialize(modelOut);
        } catch (IOException io) {

        } finally {
            if (modelOut != null) {
                try {
                    modelOut.close();
                } catch (IOException io) {

                }
            }
        }
        return "Something goes wrong with training module.";
    }

    public static String train(String lang, String entity, String taggedCoprusFile,
                               String modelFile) {
        try {
            InputStreamFactory inputStream = new InputStreamFactory() {
                FileInputStream fileInputStream = new FileInputStream("D:\\dl4j-examples-master\\dl4j-examples-master\\dl4j-examples\\src\\main\\java\\opennlpExamples\\src\\main\\resources\\technology.train");

                public InputStream createInputStream() throws IOException {
                    return fileInputStream;
                }
            };
            // InputStreamFactory temp= new InputStream("D:\\dl4j-examples-master\\dl4j-examples-master\\dl4j-examples\\src\\main\\java\\opennlpExamples\\src\\main\\resources\\en-ner-medical.train") ;
            return train(lang, entity, inputStream,
                new FileOutputStream(modelFile));
        } catch (Exception e) {
            e.printStackTrace();
        }
        return "Something goes wrong with training module.";
    }
}

现在加载保存的模型，当我传递一个句子来预测输出时，它只选择第一个单词，并且只有当第一个单词的第一个字母是大写时。

找到负载模型并预测下面的代码，

public class nameEntity {
    public static void main(String[] args) throws Exception {
        InputStream modelIn = new FileInputStream( "D:/main/techno.bin");
        InputStream tokenModelIn = new FileInputStream( "C:/openNLP/en-
        token.bin");
        try {
            TokenNameFinderModel model = new TokenNameFinderModel(modelIn);
            NameFinderME nameFinder = new NameFinderME(model);
            //Instantiating the NameFinder class
            //nameFinder = new NameFinderME(model);
        TokenizerModel tokenModel = new TokenizerModel(tokenModelIn);

        //Instantiating the TokenizerME class
        TokenizerME tokenizer = new TokenizerME(tokenModel);

        //Getting the sentence in the form of String array
            String sentence = "Camel is a Java software";

        String tokens[] = tokenizer.tokenize(sentence);

        //Finding the names in the sentence
        nameFinder.clearAdaptiveData();
            Span nameSpans[] = nameFinder.find(tokens);
            System.out.println(sentence);
            //Printing the spans of the names in the sentence
            for(Span s: nameSpans) {
                System.out.println(s.toString()+"  "+tokens[s.getStart()]);
            }

    }
}

列车档案：

Abdera实施Atom Syndication Format和Atom Publishing Protocol，Accumulo安全实现BigTable，支持不同通信协议和客户端的ActiveMQ消息代理，包括完整的Java消息服务（JMS）1.1客户端。 Allura基于Python的软件伪造的开源实现。基于Ant Java的构建工具，Apache Arrow＆＃34;用于柱状内存分析的高性能跨系统数据层＆＃34;。 APR Apache Portable Runtime，一个用C编写的可移植性库，Archiva Build Artifact Repository Manager，Apache Beam，一个用于大数据的超级API Beehive Java可视对象模型。基于Trac的Bloodhound缺陷跟踪器[3]。方解析动态数据管理框架，Camel声明性路由和中介规则引擎，它使用基于Java的域特定语言实现企业集成模式。

输出当第一个字母的第一个单词为大写时： Camel是Java软件吗？ [0..1]技术是

输出当第一个字母的第一个单词不是大写字母时： camel是一个Java软件

现在这里发生的是，如果在火车文件中找到第一个单词。输出是句子的第一个单词iff第一个单词是否为大写字母。

尝试使用openNlp工具1.6.0＆amp; 1.7.2版本训练模型。

请告诉我，问题出在哪里？我错过了任何规则吗？

提前致谢。

Open NLP NER未经过适当的培训

0 个答案: