我有以下用于培训Open NLP POS Tagger的代码
Trainer(String trainingData, String modelSavePath, String dictionary){
try {
dataIn = new MarkableFileInputStreamFactory(
new File(trainingData));
lineStream = new PlainTextByLineStream(dataIn, "UTF-8");
ObjectStream<POSSample> sampleStream = new WordTagSampleStream(lineStream);
POSTaggerFactory fac=new POSTaggerFactory();
if(dictionary!=null && dictionary.length()>0)
{
fac.setDictionary(new Dictionary(new FileInputStream(dictionary)));
}
model = POSTaggerME.train("en", sampleStream, TrainingParameters.defaultParams(), fac);
} catch (IOException e) {
// Failed to read or parse training data, training failed
e.printStackTrace();
} finally {
if (lineStream != null) {
try {
lineStream.close();
} catch (IOException e) {
// Not an issue, training already finished.
// The exception should be logged and investigated
// if part of a production system.
e.printStackTrace();
}
}
}
}
这很好用。现在,是否有可能在不涉及文件的情况下做同样的事情?我想将训练数据存储在某个地方的数据库中。然后我可以将其作为流或块读取并将其提供给培训师。我不想创建临时文件。这可能吗?
答案 0 :(得分:1)
是的,您可以创建自己的InputStream实现,而不是将FileInputStream传递给字典,例如说DatabaseSourceInputStream并使用它。