与StanfordCoreNLP的Piglatin jodatime错误

时间:2013-06-14 10:54:10

标签: apache-pig stanford-nlp

我正在尝试创建一个Pig UDF,它使用通过sista Scala API连接的Stanford CoreNLP包来提取推文中提到的位置。使用'sbt run'在本地运行时它可以正常工作,但是当从Pig调用时抛出“java.lang.NoSuchMethodError”异常:

  

从tagger加载默认属性   埃杜/斯坦福/ NLP /模型/ POS-恶搞/英left3words /英left3words-distsim.tagger   从中读取POS标记模型   埃杜/斯坦福/ NLP /模型/ POS-恶搞/英left3words /英left3words-distsim.tagger   从edu / stanford / nlp / models / ner / english.all.3class.distsim.crf.ser.gz加载分类器   2013-06-14 10:47:54,952 [通讯线程] INFO   org.apache.hadoop.mapred.LocalJobRunner - reduce>减少完成[7.5   秒]。从中加载分类器   edu / stanford / nlp / models / ner / english.muc.7class.distsim.crf.ser.gz ...   2013-06-14 10:48:02,108 [低内存检测器]信息   org.apache.pig.impl.util.SpillableMemoryManager - 第一个内存处理程序   call - 收集阈值init = 18546688(18112K)used =   358671232(350264K)已提交= 366542848(357952K)max =   699072512(682688K)完成[5.0秒]。从中加载分类器   埃杜/斯坦福/ NLP /模型/ NER / english.conll.4class.distsim.crf.ser.gz   ... 2013-06-14 10:48:10,522 [低内存检测器]信息   org.apache.pig.impl.util.SpillableMemoryManager - 第一个内存处理程序   call-使用阈值init = 18546688(18112K)used =   590012928(576184K)已提交= 597786624(583776K)max =   699072512(682688K)完成[5.6秒]。 2013-06-14 10:48:11,469 [Thread-11]   WARN org.apache.hadoop.mapred.LocalJobRunner - job_local_0001   java.lang.NoSuchMethodError:   org.joda.time.Duration.compareTo(Lorg /乔达/时间/ ReadableDuration;)我     在edu.stanford.nlp.time.SUTime $ Duration.compareTo(SUTime.java:3406)     在edu.stanford.nlp.time.SUTime $ Duration.max(SUTime.java:3488)at at   edu.stanford.nlp.time.SUTime $ Time.difference(SUTime.java:1308)at at   edu.stanford.nlp.time.SUTime $ Range。(SUTime.java:3793)at   edu.stanford.nlp.time.SUTime。(SUTime.java:570)

以下是相关代码:

object CountryTokenizer {
  def tokenize(text: String): String = {
    val locations = TweetEntityExtractor.NERLocationFilter(text)
    println(locations)
    locations.map(x => Cities.country(x)).flatten.mkString(" ")
  }
}

class PigCountryTokenizer extends EvalFunc[String] {
  override def exec(tuple: Tuple): java.lang.String = {
    val text: java.lang.String = Util.cast[java.lang.String](tuple.get(0))
    CountryTokenizer.tokenize(text)
  }
}

object TweetEntityExtractor {
    val processor:Processor = new CoreNLPProcessor()


    def NERLocationFilter(text: String): List[String] =  {
        val doc = processor.mkDocument(text)

        processor.tagPartsOfSpeech(doc)
        processor.lemmatize(doc)
        processor.recognizeNamedEntities(doc)

        val locations = doc.sentences.map(sentence => {
            val entities = sentence.entities.map(List.fromArray(_)) match {
                case Some(l) => l
                case _ => List()
            }
            val words = List.fromArray(sentence.words)

            (words zip entities).filter(x => {
                x._1 != "" && x._2 == "LOCATION" 
            }).map(_._1)
        })
        List.fromArray(locations).flatten
    }
}

我正在使用sbt-assembly来构造一个fat-jar,所以应该可以访问joda-time jar文件。发生了什么事?

1 个答案:

答案 0 :(得分:0)

Pig有自己版本的joda-time(1.6),与2.x不兼容。