有人可以通过使用coreNLP来帮助我纠正我为法语执行coreference注释的设置吗?我通过编辑属性文件尝试了基本建议:
annotators = tokenize, ssplit, pos, parse, lemma, ner, parse, depparse, mention, coref
tokenize.language = fr
pos.model = edu/stanford/nlp/models/pos-tagger/french/french.tagger
parse.model = edu/stanford/nlp/models/lexparser/frenchFactored.ser.gz
命令:
java -cp "*" -Xmx2g edu.stanford.nlp.pipeline.StanfordCoreNLP -props frenchProps.properties -file frenchFile.txt
获取以下输出日志:
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator tokenize
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ssplit
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator pos
Reading POS tagger model from edu/stanford/nlp/models/pos-tagger/french/french.tagger ... done [0.3 sec].
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator parse
[main] INFO edu.stanford.nlp.parser.common.ParserGrammar - Loading parser from serialized file edu/stanford/nlp/models/lexparser/frenchFactored.ser.gz ...
done [2.2 sec].
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator lemma
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ner
Loading classifier from edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz ... done [2.0 sec].
Loading classifier from edu/stanford/nlp/models/ner/english.muc.7class.distsim.crf.ser.gz ... done [0.7 sec].
Loading classifier from edu/stanford/nlp/models/ner/english.conll.4class.distsim.crf.ser.gz ... done [0.9 sec].
[main] INFO edu.stanford.nlp.time.JollyDayHolidays - Initializing JollyDayHoliday for SUTime from classpath edu/stanford/nlp/models/sutime/jollyday/Holidays_sutime.xml as sutime.binder.1.
Reading TokensRegex rules from edu/stanford/nlp/models/sutime/defs.sutime.txt
ago 23, 2016 5:37:34 PM edu.stanford.nlp.ling.tokensregex.CoreMapExpressionExtractor appendRules
INFORMACIÓN: Read 83 rules
Reading TokensRegex rules from edu/stanford/nlp/models/sutime/english.sutime.txt
ago 23, 2016 5:37:34 PM edu.stanford.nlp.ling.tokensregex.CoreMapExpressionExtractor appendRules
INFORMACIÓN: Read 267 rules
Reading TokensRegex rules from edu/stanford/nlp/models/sutime/english.holidays.sutime.txt
ago 23, 2016 5:37:34 PM edu.stanford.nlp.ling.tokensregex.CoreMapExpressionExtractor appendRules
INFORMACIÓN: Read 25 rules
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator parse
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator depparse
Loading depparse model file: edu/stanford/nlp/models/parser/nndep/english_UD.gz ...
PreComputed 100000, Elapsed Time: 1.639 (s)
Initializing dependency parser done [6.4 sec].
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator mention
Using mention detector type: rule
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator coref
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOfRange(Arrays.java:3664)
at java.lang.String.<init>(String.java:207)
at java.lang.StringBuilder.toString(StringBuilder.java:407)
at java.io.ObjectInputStream$BlockDataInputStream.readUTFBody(ObjectInputStream.java:3097)
at java.io.ObjectInputStream$BlockDataInputStream.readUTF(ObjectInputStream.java:2892)
at java.io.ObjectInputStream.readString(ObjectInputStream.java:1646)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1344)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:373)
at java.util.HashMap.readObject(HashMap.java:1402)
at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1058)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1909)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2018)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1942)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:373)
at edu.stanford.nlp.io.IOUtils.readObjectFromURLOrClasspathOrFileSystem(IOUtils.java:324)
at edu.stanford.nlp.scoref.SimpleLinearClassifier.<init>(SimpleLinearClassifier.java:30)
at edu.stanford.nlp.scoref.PairwiseModel.<init>(PairwiseModel.java:75)
at edu.stanford.nlp.scoref.PairwiseModel$Builder.build(PairwiseModel.java:57)
at edu.stanford.nlp.scoref.ClusteringCorefSystem.<init>(ClusteringCorefSystem.java:31)
at edu.stanford.nlp.scoref.StatisticalCorefSystem.fromProps(StatisticalCorefSystem.java:48)
at edu.stanford.nlp.pipeline.CorefAnnotator.<init>(CorefAnnotator.java:66)
at edu.stanford.nlp.pipeline.AnnotatorImplementations.coref(AnnotatorImplementations.java:220)
at edu.stanford.nlp.pipeline.AnnotatorFactories$13.create(AnnotatorFactories.java:515)
at edu.stanford.nlp.pipeline.AnnotatorPool.get(AnnotatorPool.java:85)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.construct(StanfordCoreNLP.java:375)
这让我觉得有额外缺少配置的东西。