如何使用非英语解析模型的Stanford CoreNLP?

时间:2013-10-23 01:28:07

标签: nlp tagging stanford-nlp

我正试图检测句子是否在active or passive。为此,我使用Stanford CoreNLP并注意依赖项'nsubj'(= active)或'nsubjpass'(= passive)。

如果您感兴趣,这适用于英语(code is here),输出如下:

输出:

Adding annotator tokenize
Adding annotator ssplit
Adding annotator pos
Reading POS tagger model from lib/stanford-postagger-full-2013-06-20/models/english-left3words-distsim.tagger ... done [1,2 sec].
Adding annotator lemma
Adding annotator parse
Loading parser from serialized file edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz ... done [1,1 sec].
reln: det
reln: nsubjpass <-- yeah! All I want. Passive sentence detected!
reln: auxpass
reln: root
reln: det
reln: prep_for

但是,我现在也想使用德语并为此更改以下行:

Properties props = new Properties();
props.put("parse.flags", "");
props.put("pos.model", "lib/stanford-postagger-full-2013-06-20/models/german-fast.tagger");
props.put("annotators", "tokenize, ssplit, pos, lemma, parse");
props.put("parse.model", "edu/stanford/nlp/models/lexparser/germanPCFG.ser.gz"); <--- not there

这失败了,因为jar中没有文件解析模型“germanPCFG.ser.gz”(stanford-corenlp-3.2.0-models.jar) - 只有英文版。网上有德语解析模型,我可以包括(see this one, for example),但随后我得到了大量的堆栈跟踪。

Loading parser from serialized file lib/stanford-postagger-full-2013-06-20/germanFactored.ser.gz ...
java.lang.NullPointerException
    at edu.stanford.nlp.parser.lexparser.BinaryGrammar.init(BinaryGrammar.java:224)
    at edu.stanford.nlp.parser.lexparser.BinaryGrammar.readObject(BinaryGrammar.java:211)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:969)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1848)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1752)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1328)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1946)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1870)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1752)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1328)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:350)
    at edu.stanford.nlp.parser.lexparser.LexicalizedParser.loadModel(LexicalizedParser.java:172)
    at edu.stanford.nlp.parser.lexparser.LexicalizedParser.getParserFromSerializedFile(LexicalizedParser.java:607)
    at edu.stanford.nlp.parser.lexparser.LexicalizedParser.getParserFromFile(LexicalizedParser.java:401)
    at edu.stanford.nlp.parser.lexparser.LexicalizedParser.loadModel(LexicalizedParser.java:158)
    at edu.stanford.nlp.parser.lexparser.LexicalizedParser.loadModel(LexicalizedParser.java:144)
    at edu.stanford.nlp.pipeline.ParserAnnotator.loadModel(ParserAnnotator.java:177)
    at edu.stanford.nlp.pipeline.ParserAnnotator.<init>(ParserAnnotator.java:107)
    at edu.stanford.nlp.pipeline.StanfordCoreNLP$12.create(StanfordCoreNLP.java:736)
    at edu.stanford.nlp.pipeline.AnnotatorPool.get(AnnotatorPool.java:81)
    at edu.stanford.nlp.pipeline.StanfordCoreNLP.construct(StanfordCoreNLP.java:260)
    at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:127)
    at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:123)
    at nlp.Tagger.parse(Tagger.java:83)
    at nlp.GUI$5.doInBackground(GUI.java:474)
    at nlp.GUI$5.doInBackground(GUI.java:468)
    at javax.swing.SwingWorker$1.call(SwingWorker.java:277)
    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
    at java.util.concurrent.FutureTask.run(FutureTask.java:138)
    at javax.swing.SwingWorker.run(SwingWorker.java:316)
    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    at java.lang.Thread.run(Thread.java:662)
Loading parser from text file lib/stanford-postagger-full-2013-06-20/germanFactored.ser.gz java.lang.RuntimeException: lib/stanford-postagger-full-2013-06-20/germanFactored.ser.gz: expecting BEGIN block; got ��

如果我只使用英语解析模型(englishPCFG.ser.gz)进行德语输入,则无法正确检测德语被动句。 有关如何继续的任何建议?

0 个答案:

没有答案