Spark Streaming Kafka java.lang.ClassNotFoundException:org.apache.kafka.common.serialization.StringDeserializer

时间:2017-08-22 11:47:43

标签: apache-kafka spark-streaming spark-streaming-kafka

我正在使用Kafka集成的spark流,当我在本地模式下从IDE运行流应用程序时,一切都可以作为魅力。但是,只要我将其提交到群集,我就会遇到以下错误:

  

抛出java.lang.ClassNotFoundException:   org.apache.kafka.common.serialization.StringDeserializer

我正在使用sbt程序集来构建我的项目。

我的sbt是这样的:

libraryDependencies ++= Seq(
  "org.apache.spark" % "spark-streaming-kafka-0-10_2.11" % "2.2.0" % Provided,
  "org.apache.spark" % "spark-core_2.11" % "2.2.0" % Provided,
  "org.apache.spark" % "spark-streaming_2.11" % "2.2.0" % Provided,
  "org.marc4j" % "marc4j" % "2.8.2",
  "net.sf.saxon" % "Saxon-HE" % "9.7.0-20"
)


run in Compile := Defaults.runTask(fullClasspath in Compile, mainClass in (Compile, run), runner in (Compile, run)).evaluated


mainClass in assembly := Some("EstimatorStreamingApp")

我也尝试使用--package选项

尝试1

--packages org.apache.spark:spark-streaming-kafka-0-10_2.11:2.2.0

尝试2

--packages org.apache.spark:spark-streaming-kafka-0-10-assembly_2.11:2.2.0

一切都没有成功。有没有人有任何建议

1 个答案:

答案 0 :(得分:2)

您需要删除"提供的"来自Kafka依赖的标志,因为它是一个依赖没有提供OOTB与Spark:

libraryDependencies ++= Seq(
  "org.apache.spark" % "spark-streaming-kafka-0-10_2.11" % "2.2.0",
  "org.apache.spark" % "spark-core_2.11" % "2.2.0" % Provided,
  "org.apache.spark" % "spark-streaming_2.11" % "2.2.0" % Provided,
  "org.marc4j" % "marc4j" % "2.8.2",
  "net.sf.saxon" % "Saxon-HE" % "9.7.0-20"
)