我使用spark-cloudant
依赖项进行sbt管理的Spark项目。代码为available on GitHub (on spark-cloudant-compile-issue
branch)。
我已将以下行添加到build.sbt
:
"cloudant-labs" % "spark-cloudant" % "1.6.4-s_2.10" % "provided"
所以build.sbt
如下所示:
name := "Movie Rating"
version := "1.0"
scalaVersion := "2.10.5"
libraryDependencies ++= {
val sparkVersion = "1.6.0"
Seq(
"org.apache.spark" %% "spark-core" % sparkVersion % "provided",
"org.apache.spark" %% "spark-sql" % sparkVersion % "provided",
"org.apache.spark" %% "spark-streaming" % sparkVersion % "provided",
"org.apache.spark" %% "spark-streaming-kafka" % sparkVersion % "provided",
"org.apache.spark" %% "spark-mllib" % sparkVersion % "provided",
"org.apache.kafka" % "kafka-log4j-appender" % "0.9.0.0",
"org.apache.kafka" % "kafka-clients" % "0.9.0.0",
"org.apache.kafka" %% "kafka" % "0.9.0.0",
"cloudant-labs" % "spark-cloudant" % "1.6.4-s_2.10" % "provided"
)
}
assemblyMergeStrategy in assembly := {
case PathList("org", "apache", "spark", xs @ _*) => MergeStrategy.first
case PathList("scala", xs @ _*) => MergeStrategy.discard
case PathList("META-INF", "maven", "org.slf4j", xs @ _* ) => MergeStrategy.first
case x =>
val oldStrategy = (assemblyMergeStrategy in assembly).value
oldStrategy(x)
}
unmanagedBase <<= baseDirectory { base => base / "lib" }
assemblyOption in assembly := (assemblyOption in assembly).value.copy(includeScala = false)
当我执行sbt assembly
时,我收到以下错误:
java.lang.RuntimeException: Please add any Spark dependencies by
supplying the sparkVersion and sparkComponents. Please remove:
org.apache.spark:spark-core:1.6.0:provided
答案 0 :(得分:1)
可能相关:https://github.com/databricks/spark-csv/issues/150
您可以尝试将spIgnoreProvided := true
添加到build.sbt
吗?
(这可能不是答案,我本来可以发表评论,但我没有足够的声誉)
答案 1 :(得分:1)
注意我仍然无法重现这个问题,但认为这并不重要。
java.lang.RuntimeException:请通过提供sparkVersion和sparkComponents来添加任何Spark依赖项。
在您的情况下,您的build.sbt
错过了sbt解析程序来查找spark-cloudant
依赖项。您应该将以下行添加到build.sbt
:
resolvers += "spark-packages" at "https://dl.bintray.com/spark-packages/maven/"
PROTIP 我强烈建议您首先使用spark-shell
,并且只有当您对包裹切换到sbt时才会感到舒服(特别是如果您不熟悉sbt,或许是其他库/依赖项也是如此)。一口气消化太多了。关注https://spark-packages.org/package/cloudant-labs/spark-cloudant。