如何训练没有文件的Open NLP

时间:2017-04-03 18:25:51

标签: machine-learning nlp opennlp pos-tagger

我有以下用于培训Open NLP POS Tagger的代码

Trainer(String trainingData, String modelSavePath, String dictionary){

    try {
        dataIn = new MarkableFileInputStreamFactory(
                new File(trainingData));

        lineStream = new PlainTextByLineStream(dataIn, "UTF-8");
        ObjectStream<POSSample> sampleStream = new WordTagSampleStream(lineStream);

        POSTaggerFactory fac=new POSTaggerFactory();
        if(dictionary!=null && dictionary.length()>0)
        {
            fac.setDictionary(new Dictionary(new FileInputStream(dictionary)));
        }
        model = POSTaggerME.train("en", sampleStream, TrainingParameters.defaultParams(), fac);

    } catch (IOException e) {
        // Failed to read or parse training data, training failed
        e.printStackTrace();
    } finally {
        if (lineStream != null) {
            try {
                lineStream.close();
            } catch (IOException e) {
                // Not an issue, training already finished.
                // The exception should be logged and investigated
                // if part of a production system.
                e.printStackTrace();
            }
        }
    }
}

这很好用。现在,是否有可能在不涉及文件的情况下做同样的事情?我想将训练数据存储在某个地方的数据库中。然后我可以将其作为流或块读取并将其提供给培训师。我不想创建临时文件。这可能吗?

1 个答案:

答案 0 :(得分:1)

是的,您可以创建自己的InputStream实现,而不是将FileInputStream传递给字典,例如说DatabaseSourceInputStream并使用它。