object databricks不是package com的成员

时间:2018-03-31 14:26:03

标签: apache-spark stanford-nlp apache-zeppelin databricks

我正在尝试使用Zeppelin(HDP 2.6)在Spark2中使用Stanford NLP库。显然,Databricks为Spark的Stanford NLP库构建了包装器。链接:https://github.com/databricks/spark-corenlp

我已从here下载了上述包装器的jar,并从here下载了Stanford NLP jar。然后我在Zeppelin的Spark2解释器设置中添加了两组jar作为依赖项,并重新启动了解释器。

以下示例程序仍然给出错误“对象数据库不是包com的成员        import com.databricks.spark.corenlp.functions ._“

import org.apache.spark.sql.functions._
import com.databricks.spark.corenlp.functions._

import sqlContext.implicits._

val input = Seq(
  (1, "<xml>Stanford University is located in California. It is a great university.</xml>")
).toDF("id", "text")

val output = input
  .select(cleanxml('text).as('doc))
  .select(explode(ssplit('doc)).as('sen))
  .select('sen, tokenize('sen).as('words), ner('sen).as('nerTags), sentiment('sen).as('sentiment))

output.show(truncate = false)

1 个答案:

答案 0 :(得分:1)

问题与下载Databricks corenlp的jar文件有关。我是从location下载的。问题解决了。