用于从Twitter流式传输的Spark依赖项配置

时间:2018-04-08 13:05:41

标签: scala apache-spark twitter sbt

我正在尝试使用Twitter流媒体运行Spark应用程序。但是,我经常遇到依赖问题。 当我使用org.apache.bahir spark-streaming-twitter依赖时,我遇到了这样的错误:

module not found: org.apache.bahir#spark-streaming-twitter;2.0.0

这是相应的build.sbt文件:

version := "0.1"
scalaVersion := "2.11.12"

libraryDependencies ++= Seq(
    "org.apache.bahir" %% "spark-streaming-twitter" % "2.0.0",
  "org.apache.spark" %% "spark-core" % "2.3.0",
  "org.apache.spark" % "spark-streaming_2.11" % "2.3.0",
  "com.typesafe" % "config" % "1.3.0",
  "org.twitter4j" % "twitter4j-stream" % "4.0.6"
)

但是当我使用较旧的流式依赖时,我会收到ClassNotFoundException: : org.apache.spark.Logging错误。 这是相应的build.sbt:

version := "0.1"
scalaVersion := "2.11.12"

libraryDependencies ++= Seq(
  "org.apache.spark" %% "spark-core" % "2.3.0",
  "org.apache.spark" % "spark-streaming_2.11" % "2.3.0",
  "com.typesafe" % "config" % "1.3.0",
  "org.twitter4j" % "twitter4j-stream" % "4.0.6",
  "org.apache.spark" %% "spark-streaming-twitter" % "1.6.3"
)

为了运行我的应用程序,我运行sbt clean and package命令。 那么我应该使用哪些依赖项以及如何配置它们来运行我的应用程序?

1 个答案:

答案 0 :(得分:0)

Twitter后端已经从Spark中删除了2.0版本,你声明的bahir版本与Spark版本不匹配。最后bahir Twitter已经附带twitter4j-stream依赖(此时为4.0.4)。使用:

val sparkVersion = "2.3.0"

libraryDependencies ++= Seq(
  "org.apache.bahir" %% "spark-streaming-twitter" % sparkVersion,
  "org.apache.spark" %% "spark-core" % sparkVersion,
  "org.apache.spark" %% "spark-streaming" % sparkVersion
)