我对Scala环境还不熟悉。我在尝试使用DataStax连接器组装Scala Spark作业时收到deduplicate
错误。对于什么可以解决这个问题,我很感激。
我的系统:
我的build.sbt
:
name := "spark-test"
version := "0.0.1"
scalaVersion := "2.11.7"
// additional libraries
libraryDependencies += "org.apache.spark" %% "spark-core" % "1.6.0" % "provided"
libraryDependencies += "com.datastax.spark" %% "spark-cassandra-connector" % "1.5.0-M3"
控制台:
$ sbt assembly
...
[error] 353 errors were encountered during merge
java.lang.RuntimeException: deduplicate: different file contents found in the following:
/Users/bob/.ivy2/cache/io.netty/netty-all/jars/netty-all-4.0.29.Final.jar:META-INF/io.netty.versions.properties
...
答案 0 :(得分:3)
正如我在评论中所说,这是由于不知道如何处理重复文件。这可能是由两个依赖项引起的,具体取决于同一个库的不同版本。所以你需要决定使用什么策略 - 检查sbt汇编文档,但这些是"先保留","保持最后"等等。
作为参考,这里是我的合并策略块,用于没有太多依赖项的spark项目:
assemblyMergeStrategy in assembly := {
case x if x.endsWith(".class") => MergeStrategy.last
case x if x.endsWith(".properties") => MergeStrategy.last
case x if x.contains("/resources/") => MergeStrategy.last
case x if x.startsWith("META-INF/mailcap") => MergeStrategy.last
case x if x.startsWith("META-INF/mimetypes.default") => MergeStrategy.first
case x if x.startsWith("META-INF/maven/org.slf4j/slf4j-api/pom.") => MergeStrategy.first
case x =>
val oldStrategy = (assemblyMergeStrategy in assembly).value
if (oldStrategy == MergeStrategy.deduplicate)
MergeStrategy.first
else
oldStrategy(x)
}
// this jar caused issues so I just exclude it completely
assemblyExcludedJars in assembly := {
val cp = (fullClasspath in assembly).value
cp filter {_.data.getName == "jetty-util-6.1.26.jar"}
}