我一直在尝试使用现有的中文模型(http://nlp.stanford.edu/software/stanford-chinese-corenlp-2014-10-23-models.jar)将Stanford-CoreNLP用于中文。
当我按http://nlp.stanford.edu/software/corenlp-faq.shtml#languages -
中的建议执行以下命令时java -cp stanford-corenlp-3.5.0.jar:stanford-chinese-corenlp-2014-10-23-models.jar -Xmx3g edu.stanford.nlp.pipeline.StanfordCoreNLP -props StanfordCoreNLP-chinese.properties -file chinese_sample_text.txt
我总是得到这个文件的 java.io.FileNotFoundException - /u/nlp/data/chinese/distsim/xin_cmn_2000-2010.ldc.seg.utf8.all-c1000
下面是完整的堆栈跟踪 -
Registering annotator segment with class edu.stanford.nlp.pipeline.ChineseSegmenterAnnotator Adding annotator segment Loading Segmentation Model [edu/stanford/nlp/models/segmenter/chinese/ctb.gz]...Loading classifier from edu/stanford/nlp/models/segmenter/chinese/ctb.gz ... Loading Chinese dictionaries from 1 files: edu/stanford/nlp/models/segmenter/chinese/dict-chris6.ser.gz
loading dictionaries from edu/stanford/nlp/models/segmenter/chinese/dict-chris6.ser.gz...Done. Unique words in ChineseDictionary is: 423200 done [19.6 sec]. done. Time elapsed: 19670 ms
Adding annotator ssplit edu.stanford.nlp.pipeline.AnnotatorImplementations:ssplit.boundaryTokenRegex=[.]|[!?]+|[。]|[!?]+
Adding annotator pos
Reading POS tagger model from edu/stanford/nlp/models/pos-tagger/chinese-distsim/chinese-distsim.tagger ... done [2.8 sec].
Adding annotator ner
Loading classifier from edu/stanford/nlp/models/ner/chinese.misc.distsim.crf.ser.gz ...
Loading distsim lexicon from /u/nlp/data/chinese/distsim/xin_cmn_2000-2010.ldc.seg.utf8.all-c1000 ...
edu.stanford.nlp.io.RuntimeIOException: java.io.FileNotFoundException:
/u/nlp/data/chinese/distsim/xin_cmn_2000-2010.ldc.seg.utf8.all-c1000 (No such file or directory)
at edu.stanford.nlp.io.IOUtils.inputStreamFromFile(IOUtils.java:481)
at edu.stanford.nlp.io.IOUtils.readerFromFile(IOUtils.java:522)
at edu.stanford.nlp.objectbank.ReaderIteratorFactory$ReaderIterator.setNextObject(ReaderIteratorFactory.java:189)
at edu.stanford.nlp.objectbank.ReaderIteratorFactory$ReaderIterator.<init>(ReaderIteratorFactory.java:161)
at edu.stanford.nlp.objectbank.ReaderIteratorFactory.iterator(ReaderIteratorFactory.java:98)
at edu.stanford.nlp.objectbank.ObjectBank$OBIterator.<init>(ObjectBank.java:404)
at edu.stanford.nlp.objectbank.ObjectBank.iterator(ObjectBank.java:242)
at edu.stanford.nlp.ie.NERFeatureFactory.initLexicon(NERFeatureFactory.java:474)
at edu.stanford.nlp.ie.NERFeatureFactory.init(NERFeatureFactory.java:382)
at edu.stanford.nlp.ie.AbstractSequenceClassifier.reinit(AbstractSequenceClassifier.java:172)
at edu.stanford.nlp.ie.crf.CRFClassifier.loadClassifier(CRFClassifier.java:2619)
at edu.stanford.nlp.ie.AbstractSequenceClassifier.loadClassifier(AbstractSequenceClassifier.java:1666)
at edu.stanford.nlp.ie.AbstractSequenceClassifier.loadClassifier(AbstractSequenceClassifier.java:1721)
at edu.stanford.nlp.ie.AbstractSequenceClassifier.loadClassifier(AbstractSequenceClassifier.java:1708)
at edu.stanford.nlp.ie.crf.CRFClassifier.getClassifier(CRFClassifier.java:2836)
at edu.stanford.nlp.ie.ClassifierCombiner.loadClassifierFromPath(ClassifierCombiner.java:189)
at edu.stanford.nlp.ie.ClassifierCombiner.loadClassifiers(ClassifierCombiner.java:173)
at edu.stanford.nlp.ie.ClassifierCombiner.<init>(ClassifierCombiner.java:113) at edu.stanford.nlp.ie.NERClassifierCombiner.<init>(NERClassifierCombiner.java:65)
at edu.stanford.nlp.pipeline.AnnotatorImplementations.ner(AnnotatorImplementations.java:99)
at edu.stanford.nlp.pipeline.AnnotatorFactories$6.create(AnnotatorFactories.java:319
at edu.stanford.nlp.pipeline.AnnotatorPool.get(AnnotatorPool.java:85)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.construct(StanfordCoreNLP.java:289)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:126)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:122)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.main(StanfordCoreNLP.java:1056)
Caused by: java.io.FileNotFoundException: /u/nlp/data/chinese/distsim/xin_cmn_2000-2010.ldc.seg.utf8.all-c1000 (No such file or directory)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.<init>(FileInputStream.java:138)
at edu.stanford.nlp.io.IOUtils.inputStreamFromFile(IOUtils.java:475) ... 25 more
Loading classifier from edu/stanford/nlp/models/ner/chinese.misc.distsim.crf.ser.gz ... Exception in thread "main" edu.stanford.nlp.io.RuntimeIOException: java.io.FileNotFoundException
at edu.stanford.nlp.pipeline.AnnotatorFactories$6.create(AnnotatorFactories.java:321)
at edu.stanford.nlp.pipeline.AnnotatorPool.get(AnnotatorPool.java:85)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.construct(StanfordCoreNLP.java:289)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:126)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:122)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.main(StanfordCoreNLP.java:1056)
Caused by: java.io.FileNotFoundException
at edu.stanford.nlp.ie.ClassifierCombiner.loadClassifierFromPath(ClassifierCombiner.java:199)
at edu.stanford.nlp.ie.ClassifierCombiner.loadClassifiers(ClassifierCombiner.java:173)
at edu.stanford.nlp.ie.ClassifierCombiner.<init>(ClassifierCombiner.java:113)
at edu.stanford.nlp.ie.NERClassifierCombiner.<init>(NERClassifierCombiner.java:65)
at edu.stanford.nlp.pipeline.AnnotatorImplementations.ner(AnnotatorImplementations.java:99)
at edu.stanford.nlp.pipeline.AnnotatorFactories$6.create(AnnotatorFactories.java:319) ... 5 more
Caused by: java.lang.ClassCastException: java.util.ArrayList cannot be cast to edu.stanford.nlp.classify.LinearClassifier
at edu.stanford.nlp.ie.ner.CMMClassifier.loadClassifier(CMMClassifier.java:1070)
at edu.stanford.nlp.ie.AbstractSequenceClassifier.loadClassifier(AbstractSequenceClassifier.java:1666)
at edu.stanford.nlp.ie.AbstractSequenceClassifier.loadClassifier(AbstractSequenceClassifier.java:1721)
at edu.stanford.nlp.ie.AbstractSequenceClassifier.loadClassifier(AbstractSequenceClassifier.java:1708)
at edu.stanford.nlp.ie.ner.CMMClassifier.getClassifier(CMMClassifier.java:1116)
at edu.stanford.nlp.ie.ClassifierCombiner.loadClassifierFromPath(ClassifierCombiner.java:195) ... 10 more
非常感谢任何帮助。
答案 0 :(得分:2)
更新:已在v3.5.1中修复。
这与this question中的问题相同。看起来我们修复了英语和西班牙语模型,但不是德语和中文模型。 :(我们将在几天内发布新版本,并确保所有NER模型在v3.5.1中都是正确的。