Apache Spark上的CoreNLP

时间:2015-06-05 23:34:54

标签: java concurrency apache-spark nlp stanford-nlp

我不确定这是否与Spark或NLP有关。请帮忙。我正在尝试在Apache Spark上运行Stanford CoreNLP库,当我尝试在多个内核上运行它时,我得到以下异常。我正在使用最新的NLP库,它是线程安全的。

这是在地图阶段发生的。

 pipeline.annotate(document);

java.util.ConcurrentModificationException

at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901)
    at java.util.ArrayList$Itr.next(ArrayList.java:851)
    at java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1042)
    at edu.stanford.nlp.trees.GrammaticalStructure.analyzeNode(GrammaticalStructure.java:463)
    at edu.stanford.nlp.trees.GrammaticalStructure.analyzeNode(GrammaticalStructure.java:488)
    at edu.stanford.nlp.trees.GrammaticalStructure.analyzeNode(GrammaticalStructure.java:488)
    at edu.stanford.nlp.trees.GrammaticalStructure.analyzeNode(GrammaticalStructure.java:488)
    at edu.stanford.nlp.trees.GrammaticalStructure.analyzeNode(GrammaticalStructure.java:488)
    at edu.stanford.nlp.trees.GrammaticalStructure.analyzeNode(GrammaticalStructure.java:488)
    at edu.stanford.nlp.trees.GrammaticalStructure.<init>(GrammaticalStructure.java:201)
    at edu.stanford.nlp.trees.EnglishGrammaticalStructure.<init>(EnglishGrammaticalStructure.java:89)
    at edu.stanford.nlp.semgraph.SemanticGraphFactory.makeFromTree(SemanticGraphFactory.java:139)
    at edu.stanford.nlp.pipeline.DeterministicCorefAnnotator.annotate(DeterministicCorefAnnotator.java:89)
    at edu.stanford.nlp.pipeline.AnnotationPipeline.annotate(AnnotationPipeline.java:68)
    at edu.stanford.nlp.pipeline.StanfordCoreNLP.annotate(StanfordCoreNLP.java:412)
    at edu.stanford.nlp.pipeline.StanfordCoreNLP.process(StanfordCoreNLP.java:441)
    at sampleApp.WordProcessor$2.call(WordProcessor.java:69)
    at sampleApp.WordProcessor$2.call(WordProcessor.java:1)

2 个答案:

答案 0 :(得分:1)

我认为这是一个CoreNLP问题。

另见Concurrent processing using Stanford CoreNLP (3.5.2)

我有同样的问题,并使用最新的github修订版(今天)的构建解决了这个问题。总之认为CoreNLP 3.5.2中存在一个错误,他们解决了这个问题。

答案 1 :(得分:0)

虽然从少量代码中说起来有点困难,但我认为关键是行eclipse -clean。很可能你正试图修改一些不支持修改的东西,解决方法就是复制你的输入。