Scalding和Hadoop与sbt-assembly的依赖性问题

时间:2013-09-14 21:59:14

标签: scala hadoop sbt sbt-assembly scalding

我正在尝试使用sbt一个简单的hadoop作业来构建,我试图在Amazon EMR上运行它。但是当我运行sbt assembly时,我收到以下错误:

[error] (*:assembly) deduplicate: different file contents found in the following:
[error] /Users/trenthauck/.ivy2/cache/org.mortbay.jetty/jsp-2.1/jars/jsp-2.1-6.1.14.jar:org/apache/jasper/compiler/Node$ChildInfo.class
[error] /Users/trenthauck/.ivy2/cache/tomcat/jasper-compiler/jars/jasper-compiler-5.5.12.jar:org/apache/jasper/compiler/Node$ChildInfo.class
[error] Total time: 10 s, completed Sep 14, 2013 4:49:24 PM

我尝试按照此处https://groups.google.com/forum/#!topic/simple-build-tool/tzkq5TioIqM的建议行事,但不起作用。

我的build.sbt看起来像:

import AssemblyKeys._

mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) =>
  {
    case PathList("org", "apache", "jasper", xs @ _*) => MergeStrategy.last
    case x => old(x)
  }
}

assemblySettings

name := "Scaling Play"

version := "SNAPSHOT-0.1"

scalaVersion := "2.10.1"

libraryDependencies ++= Seq(
  "com.twitter" % "scalding-core_2.10" % "0.8.8",
  "com.twitter" % "scalding-args_2.10" % "0.8.8",
  "com.twitter" % "scalding-date_2.10" % "0.8.8",
  "org.apache.hadoop" % "hadoop-core" % "1.0.0"
  )

1 个答案:

答案 0 :(得分:4)

指令的顺序很重要。您更新装配设置,以便稍后再次覆盖它。首先定义 assemblySettings ,然后更新它将解决它。

更新后的build.sbt:

import AssemblyKeys._

assemblySettings

mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) =>
  {
    case PathList("org", "apache", "jasper", xs @ _*) => MergeStrategy.last
    case x => old(x)
  }
}

…

之后你会发现有更多冲突的类和其他文件。在这种情况下,您将需要以下合并:

case PathList("org", "apache", xs @ _*) => MergeStrategy.last
case PathList("javax", "servlet", xs @ _*) => MergeStrategy.last
case PathList("com", "esotericsoftware", xs @ _*) => MergeStrategy.last
case PathList("project.clj") => MergeStrategy.last
case PathList("overview.html") => MergeStrategy.last
case x => old(x)

请注意,对类文件使用合并策略可能会产生问题,这是由特定类的不兼容版本引起的。如果是这种情况那么你的问题就更大了,因为那时依赖关系彼此不兼容。然后,您必须使用删除依赖项并查找/创建兼容版本。