我正面临下面帖子中描述的确切问题,建议的答案没有帮助。 sbt-assembly: deduplication found error
[error] (*:assembly) deduplicate: different file contents found in the following:
[error] C:\Users\xxx\.ivy2\cache\org.eclipse.jetty.orbit\javax.transaction\orbits\javax.transaction-1.1.1.v201105210645.jar:META-INF/ECLIPSEF.RSA
[error] C:\Users\xxx\.ivy2\cache\org.eclipse.jetty.orbit\javax.servlet\orbits\javax.servlet-3.0.0.v201112011016.jar:META-INF/ECLIPSEF.RSA
[error] C:\Users\xxx\.ivy2\cache\org.eclipse.jetty.orbit\javax.mail.glassfish\orbits\javax.mail.glassfish-1.4.1.v201005082020.jar:META-INF/ECLIPSEF.RSA
[error] C:\Users\xxx\.ivy2\cache\org.eclipse.jetty.orbit\javax.activation\orbits\javax.activation-1.1.0.v201105071233.jar:META-INF/ECLIPSEF.RSA
[error] Total time: 14 s, completed Sep 9, 2014 5:21:01 PM
我的build.sbt文件包含
name := "Simple"
version := "0.1.0"
scalaVersion := "2.10.4"
libraryDependencies ++= Seq(
"org.twitter4j" % "twitter4j-stream" % "3.0.3"
)
//libraryDependencies += "org.apache.spark" %% "spark-core" % "1.0.2"
libraryDependencies += "org.apache.spark" %% "spark-streaming" % "1.0.2"
libraryDependencies += "org.apache.spark" %% "spark-streaming-twitter" % "1.0.2"
libraryDependencies += "com.github.nscala-time" %% "nscala-time" % "0.4.2"
libraryDependencies ++= Seq(
("org.apache.spark"%%"spark-core"%"1.0.2").
exclude("org.eclipse.jetty.orbit", "javax.servlet").
exclude("org.eclipse.jetty.orbit", "javax.transaction").
exclude("org.eclipse.jetty.orbit", "javax.mail").
exclude("org.eclipse.jetty.orbit", "javax.activation").
exclude("commons-beanutils", "commons-beanutils-core").
exclude("commons-collections", "commons-collections").
exclude("commons-collections", "commons-collections").
exclude("com.esotericsoftware.minlog", "minlog")
)
resolvers += "Akka Repository" at "http://repo.akka.io/releases/"
mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) =>
{
case PathList("javax", "servlet", xs @ _*) => MergeStrategy.first
case PathList("javax", "transaction", xs @ _*) => MergeStrategy.first
case PathList("javax", "mail", xs @ _*) => MergeStrategy.first
case PathList("javax", "activation", xs @ _*) => MergeStrategy.first
case PathList(ps @ _*) if ps.last endsWith ".html" => MergeStrategy.first
case "application.conf" => MergeStrategy.concat
case "unwanted.txt" => MergeStrategy.discard
case x => old(x)
}
}
有关如何解决上述问题的任何指示?
答案 0 :(得分:3)
如果您计划从Spark运行程序,那么我强烈建议将所有Spark依赖项添加为provided
,以便将它们从程序集任务中排除。
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "1.0.2" % "provided",
"org.apache.spark" %% "spark-streaming" % "1.0.2" % "provided",
"org.apache.spark" %% "spark-streaming-twitter" % "1.0.2" % "provided")
在另一种情况下,您需要从类路径中删除这些jar
或向mergeStrategy
添加适当的行,在您的情况下它将是
case PathList("META-INF", "ECLIPSEF.RSA") => MergeStrategy.first
如果您仍然希望处理Spark的依赖地狱,那么sbt-dependency-graph插件应该有所帮助。另请注意,其他Spark依赖项(如spark-streaming
和spark-streaming-twitter
)也可能需要exclude
指令。
答案 1 :(得分:0)
所以为了得到令人讨厌的重复数据删除&#34;离开的消息我没有打扰排除的东西,它似乎没有帮助我。我复制并粘贴了sbt代码中的defaultMergeStrategy
,只是将deduplicate
所指的行更改为first
。我还必须在最后添加一个捕获来坚持first
。说实话,我不知道这意味着什么,或者为什么它让烦人的消息消失......我没有时间获得博士学位,我希望我的代码能够构建!因此,合并策略变为:
mergeStrategy in assembly <<= (mergeStrategy in assembly) ((old) => {
case x if Assembly.isConfigFile(x) =>
MergeStrategy.concat
case PathList(ps @ _*) if Assembly.isReadme(ps.last) || Assembly.isLicenseFile(ps.last) =>
MergeStrategy.rename
case PathList("META-INF", xs @ _*) =>
(xs map {_.toLowerCase}) match {
case ("manifest.mf" :: Nil) | ("index.list" :: Nil) | ("dependencies" :: Nil) =>
MergeStrategy.discard
case ps @ (x :: xs) if ps.last.endsWith(".sf") || ps.last.endsWith(".dsa") =>
MergeStrategy.discard
case "plexus" :: xs =>
MergeStrategy.discard
case "services" :: xs =>
MergeStrategy.filterDistinctLines
case ("spring.schemas" :: Nil) | ("spring.handlers" :: Nil) =>
MergeStrategy.filterDistinctLines
case _ => MergeStrategy.first // Changed deduplicate to first
}
case PathList(_*) => MergeStrategy.first // added this line
})