Stanford-NLP:处理中文文本时出现FileNotFoundException

时间:2015-01-27 09:40:45

标签: stanford-nlp

我一直在尝试使用现有的中文模型(http://nlp.stanford.edu/software/stanford-chinese-corenlp-2014-10-23-models.jar)将Stanford-CoreNLP用于中文。

当我按http://nlp.stanford.edu/software/corenlp-faq.shtml#languages -

中的建议执行以下命令时
java -cp stanford-corenlp-3.5.0.jar:stanford-chinese-corenlp-2014-10-23-models.jar -Xmx3g edu.stanford.nlp.pipeline.StanfordCoreNLP -props StanfordCoreNLP-chinese.properties -file chinese_sample_text.txt

我总是得到这个文件的 java.io.FileNotFoundException - /u/nlp/data/chinese/distsim/xin_cmn_2000-2010.ldc.seg.utf8.all-c1000

下面是完整的堆栈跟踪 -

Registering annotator segment with class edu.stanford.nlp.pipeline.ChineseSegmenterAnnotator Adding annotator segment Loading Segmentation Model [edu/stanford/nlp/models/segmenter/chinese/ctb.gz]...Loading classifier from edu/stanford/nlp/models/segmenter/chinese/ctb.gz ... Loading Chinese dictionaries from 1 files:   edu/stanford/nlp/models/segmenter/chinese/dict-chris6.ser.gz

loading dictionaries from edu/stanford/nlp/models/segmenter/chinese/dict-chris6.ser.gz...Done. Unique words in ChineseDictionary is: 423200 done [19.6 sec]. done. Time elapsed: 19670 ms 
Adding annotator ssplit edu.stanford.nlp.pipeline.AnnotatorImplementations:ssplit.boundaryTokenRegex=[.]|[!?]+|[。]|[!?]+

Adding annotator pos 
Reading POS tagger model from edu/stanford/nlp/models/pos-tagger/chinese-distsim/chinese-distsim.tagger ... done [2.8 sec]. 
Adding annotator ner 
Loading classifier from edu/stanford/nlp/models/ner/chinese.misc.distsim.crf.ser.gz ... 
Loading distsim lexicon from /u/nlp/data/chinese/distsim/xin_cmn_2000-2010.ldc.seg.utf8.all-c1000 ... 

edu.stanford.nlp.io.RuntimeIOException: java.io.FileNotFoundException: 
/u/nlp/data/chinese/distsim/xin_cmn_2000-2010.ldc.seg.utf8.all-c1000 (No such file or directory)

    at edu.stanford.nlp.io.IOUtils.inputStreamFromFile(IOUtils.java:481)
    at edu.stanford.nlp.io.IOUtils.readerFromFile(IOUtils.java:522)
    at edu.stanford.nlp.objectbank.ReaderIteratorFactory$ReaderIterator.setNextObject(ReaderIteratorFactory.java:189)
    at edu.stanford.nlp.objectbank.ReaderIteratorFactory$ReaderIterator.<init>(ReaderIteratorFactory.java:161)
    at edu.stanford.nlp.objectbank.ReaderIteratorFactory.iterator(ReaderIteratorFactory.java:98)
    at edu.stanford.nlp.objectbank.ObjectBank$OBIterator.<init>(ObjectBank.java:404)
    at edu.stanford.nlp.objectbank.ObjectBank.iterator(ObjectBank.java:242)
    at edu.stanford.nlp.ie.NERFeatureFactory.initLexicon(NERFeatureFactory.java:474)
    at edu.stanford.nlp.ie.NERFeatureFactory.init(NERFeatureFactory.java:382)
    at edu.stanford.nlp.ie.AbstractSequenceClassifier.reinit(AbstractSequenceClassifier.java:172)
    at edu.stanford.nlp.ie.crf.CRFClassifier.loadClassifier(CRFClassifier.java:2619)
    at edu.stanford.nlp.ie.AbstractSequenceClassifier.loadClassifier(AbstractSequenceClassifier.java:1666)
    at edu.stanford.nlp.ie.AbstractSequenceClassifier.loadClassifier(AbstractSequenceClassifier.java:1721)
    at edu.stanford.nlp.ie.AbstractSequenceClassifier.loadClassifier(AbstractSequenceClassifier.java:1708)
    at edu.stanford.nlp.ie.crf.CRFClassifier.getClassifier(CRFClassifier.java:2836)
    at edu.stanford.nlp.ie.ClassifierCombiner.loadClassifierFromPath(ClassifierCombiner.java:189)
    at edu.stanford.nlp.ie.ClassifierCombiner.loadClassifiers(ClassifierCombiner.java:173)
    at edu.stanford.nlp.ie.ClassifierCombiner.<init>(ClassifierCombiner.java:113)   at edu.stanford.nlp.ie.NERClassifierCombiner.<init>(NERClassifierCombiner.java:65)
    at edu.stanford.nlp.pipeline.AnnotatorImplementations.ner(AnnotatorImplementations.java:99)
    at edu.stanford.nlp.pipeline.AnnotatorFactories$6.create(AnnotatorFactories.java:319
    at edu.stanford.nlp.pipeline.AnnotatorPool.get(AnnotatorPool.java:85)
    at edu.stanford.nlp.pipeline.StanfordCoreNLP.construct(StanfordCoreNLP.java:289)
    at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:126)
    at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:122)
    at edu.stanford.nlp.pipeline.StanfordCoreNLP.main(StanfordCoreNLP.java:1056)
Caused by: java.io.FileNotFoundException: /u/nlp/data/chinese/distsim/xin_cmn_2000-2010.ldc.seg.utf8.all-c1000 (No such file or directory)
    at java.io.FileInputStream.open(Native Method)
    at java.io.FileInputStream.<init>(FileInputStream.java:138)
    at edu.stanford.nlp.io.IOUtils.inputStreamFromFile(IOUtils.java:475)    ... 25 more 

Loading classifier from edu/stanford/nlp/models/ner/chinese.misc.distsim.crf.ser.gz ... Exception in thread "main" edu.stanford.nlp.io.RuntimeIOException: java.io.FileNotFoundException
    at edu.stanford.nlp.pipeline.AnnotatorFactories$6.create(AnnotatorFactories.java:321)
    at edu.stanford.nlp.pipeline.AnnotatorPool.get(AnnotatorPool.java:85)
    at edu.stanford.nlp.pipeline.StanfordCoreNLP.construct(StanfordCoreNLP.java:289)
    at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:126)
    at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:122)
    at edu.stanford.nlp.pipeline.StanfordCoreNLP.main(StanfordCoreNLP.java:1056)
Caused by: java.io.FileNotFoundException
    at edu.stanford.nlp.ie.ClassifierCombiner.loadClassifierFromPath(ClassifierCombiner.java:199)
    at edu.stanford.nlp.ie.ClassifierCombiner.loadClassifiers(ClassifierCombiner.java:173)
    at edu.stanford.nlp.ie.ClassifierCombiner.<init>(ClassifierCombiner.java:113)
    at edu.stanford.nlp.ie.NERClassifierCombiner.<init>(NERClassifierCombiner.java:65)
    at edu.stanford.nlp.pipeline.AnnotatorImplementations.ner(AnnotatorImplementations.java:99)
    at edu.stanford.nlp.pipeline.AnnotatorFactories$6.create(AnnotatorFactories.java:319)   ... 5 more
Caused by: java.lang.ClassCastException: java.util.ArrayList cannot be cast to edu.stanford.nlp.classify.LinearClassifier
    at edu.stanford.nlp.ie.ner.CMMClassifier.loadClassifier(CMMClassifier.java:1070)
    at edu.stanford.nlp.ie.AbstractSequenceClassifier.loadClassifier(AbstractSequenceClassifier.java:1666)
    at edu.stanford.nlp.ie.AbstractSequenceClassifier.loadClassifier(AbstractSequenceClassifier.java:1721)
    at edu.stanford.nlp.ie.AbstractSequenceClassifier.loadClassifier(AbstractSequenceClassifier.java:1708)
    at edu.stanford.nlp.ie.ner.CMMClassifier.getClassifier(CMMClassifier.java:1116)
    at edu.stanford.nlp.ie.ClassifierCombiner.loadClassifierFromPath(ClassifierCombiner.java:195)   ... 10 more

非常感谢任何帮助。

1 个答案:

答案 0 :(得分:2)

更新:已在v3.5.1中修复。

这与this question中的问题相同。看起来我们修复了英语和西班牙语模型,但不是德语和中文模型。 :(我们将在几天内发布新版本,并确保所有NER模型在v3.5.1中都是正确的。