//tagger
MaxentTagger tagger = new MaxentTagger(args[0]);
TokenizerFactory<CoreLabel> ptbTokenizerFactory = PTBTokenizer.factory(new CoreLabelTokenFactory(),
"untokenizable=noneKeep");
BufferedReader r = new BufferedReader(new InputStreamReader(new FileInputStream(args[1]), "utf-8"));
PrintWriter pw = new PrintWriter(new OutputStreamWriter(System.out, "utf-8"));
DocumentPreprocessor documentPreprocessor = new DocumentPreprocessor(r);
documentPreprocessor.setTokenizerFactory(ptbTokenizerFactory);
for (List<HasWord> sentence : documentPreprocessor) {
List<TaggedWord> tSentence = tagger.tagSentence(sentence);
pw.println(Sentence.listToString(tSentence, false));
}
失败并出现以下异常 从C:\ work \ development \ workspace \ stanfordnlp \ sample.txt中读取POS标记模型...
C:\work\development\workspace\stanfordnlp\sample.txtException in thread "main" edu.stanford.nlp.io.RuntimeIOException: Error while loading a tagger model (probably missing model file)
at edu.stanford.nlp.tagger.maxent.MaxentTagger.readModelAndInit(MaxentTagger.java:869)
at edu.stanford.nlp.tagger.maxent.MaxentTagger.readModelAndInit(MaxentTagger.java:767)
at edu.stanford.nlp.tagger.maxent.MaxentTagger.<init>(MaxentTagger.java:298)
at edu.stanford.nlp.tagger.maxent.MaxentTagger.<init>(MaxentTagger.java:263)
at phoenix.TokenizerDemo.main(TokenizerDemo.java:42)
Caused by: java.io.StreamCorruptedException: invalid stream header: 416E6F74
at java.io.ObjectInputStream.readStreamHeader(Unknown Source)
at java.io.ObjectInputStream.<init>(Unknown Source)
at edu.stanford.nlp.tagger.maxent.TaggerConfig.readConfig(TaggerConfig.java:748)
at edu.stanford.nlp.tagger.maxent.MaxentTagger.readModelAndInit(MaxentTagger.java:804)
... 4 more
答案 0 :(得分:1)
日志应清楚地表明问题:
从C:\ work \ development \ workspace \ stanfordnlp \ sample.txt中读取POS标记模型......
您错误地实例化了MaxentTagger
实例。如果为构造函数提供单个字符串参数,则该字符串应提供标记器模型文件的路径。
有关详细信息,请参阅documentation for MaxentTagger
。