使用spark-cassandra-connector进行装配失败

时间:2016-01-06 16:21:38

标签: scala apache-spark spark-cassandra-connector

我对Scala环境还不熟悉。我在尝试使用DataStax连接器组装Scala Spark作业时收到deduplicate错误。对于什么可以解决这个问题,我很感激。

我的系统:

  • 最新的Scala(2.11.7)通过brew安装
  • 最新Spark(2.10.5)通过brew安装
  • 最新SBT(0.13.9)通过brew安装
  • SBT Assembly已安装插件

我的build.sbt

name := "spark-test"

version := "0.0.1"

scalaVersion := "2.11.7"

// additional libraries
libraryDependencies += "org.apache.spark" %% "spark-core" % "1.6.0" %     "provided"
libraryDependencies += "com.datastax.spark" %% "spark-cassandra-connector" % "1.5.0-M3"

控制台:

$ sbt assembly
...
[error] 353 errors were encountered during merge
java.lang.RuntimeException: deduplicate: different file contents found in the following:
/Users/bob/.ivy2/cache/io.netty/netty-all/jars/netty-all-4.0.29.Final.jar:META-INF/io.netty.versions.properties 
... 

1 个答案:

答案 0 :(得分:3)

正如我在评论中所说,这是由于不知道如何处理重复文件。这可能是由两个依赖项引起的,具体取决于同一个库的不同版本。所以你需要决定使用什么策略 - 检查sbt汇编文档,但这些是"先保留","保持最后"等等。

作为参考,这里是我的合并策略块,用于没有太多依赖项的spark项目:

assemblyMergeStrategy in assembly := {
  case x if x.endsWith(".class") => MergeStrategy.last
  case x if x.endsWith(".properties") => MergeStrategy.last
  case x if x.contains("/resources/") => MergeStrategy.last
  case x if x.startsWith("META-INF/mailcap") => MergeStrategy.last
  case x if x.startsWith("META-INF/mimetypes.default") => MergeStrategy.first
  case x if x.startsWith("META-INF/maven/org.slf4j/slf4j-api/pom.") => MergeStrategy.first
  case x =>
    val oldStrategy = (assemblyMergeStrategy in assembly).value
    if (oldStrategy == MergeStrategy.deduplicate)
      MergeStrategy.first
    else
      oldStrategy(x)
}

// this jar caused issues so I just exclude it completely
assemblyExcludedJars in assembly := {
  val cp = (fullClasspath in assembly).value
  cp filter {_.data.getName == "jetty-util-6.1.26.jar"}
}