斯坦福大学CoreNLP中文模型培训

时间:2020-07-27 16:12:00

标签: java error-handling nlp stanford-nlp

我目前正在尝试使用CoreNLP训练我自己的中文NER模型,但是当执行训练命令时,我得到了FileNotFoundException。我已经看到有关此错误的帖子已在CoreNLP 3.5.0中修复,但是我正在使用4.1.0,并且仍在发生。

Exception in thread "main" edu.stanford.nlp.io.RuntimeIOException: java.io.FileNotFoundException: \u\nlp\data\chinese\distsim\xin_cmn_200907-201012.ldc.seg.utf8.c1000 (The system cannot find the path specified)
        at edu.stanford.nlp.io.IOUtils.inputStreamFromFile(IOUtils.java:523)
        at edu.stanford.nlp.io.IOUtils.readerFromFile(IOUtils.java:558)
        at edu.stanford.nlp.objectbank.ReaderIteratorFactory$ReaderIterator.setNextObject(ReaderIteratorFactory.java:189)
        at edu.stanford.nlp.objectbank.ReaderIteratorFactory$ReaderIterator.<init>(ReaderIteratorFactory.java:161)
        at edu.stanford.nlp.objectbank.ReaderIteratorFactory.iterator(ReaderIteratorFactory.java:98)
        at edu.stanford.nlp.objectbank.ObjectBank$OBIterator.<init>(ObjectBank.java:411)
        at edu.stanford.nlp.objectbank.ObjectBank.iterator(ObjectBank.java:250)
        at edu.stanford.nlp.ie.NERFeatureFactory.initLexicon(NERFeatureFactory.java:588)
        at edu.stanford.nlp.ie.NERFeatureFactory.init(NERFeatureFactory.java:389)
        at edu.stanford.nlp.ie.AbstractSequenceClassifier.reinit(AbstractSequenceClassifier.java:210)
        at edu.stanford.nlp.ie.AbstractSequenceClassifier.<init>(AbstractSequenceClassifier.java:190)
        at edu.stanford.nlp.ie.crf.CRFClassifier.<init>(CRFClassifier.java:181)
        at edu.stanford.nlp.ie.crf.CRFClassifier.chooseCRFClassifier(CRFClassifier.java:2919)
        at edu.stanford.nlp.ie.crf.CRFClassifier.main(CRFClassifier.java:2930)
Caused by: java.io.FileNotFoundException: \u\nlp\data\chinese\distsim\xin_cmn_200907-201012.ldc.seg.utf8.c1000 (The system cannot find the path specified)
        at java.base/java.io.FileInputStream.open0(Native Method)
        at java.base/java.io.FileInputStream.open(FileInputStream.java:212)
        at java.base/java.io.FileInputStream.<init>(FileInputStream.java:154)
        at edu.stanford.nlp.io.IOUtils.inputStreamFromFile(IOUtils.java:516)
        ... 13 more

1 个答案:

答案 0 :(得分:0)

该文件未由我们公开分发。我可以问是否允许我们分享。同时,您需要将训练属性更改为设置useDistSim = false才能避免此错误。