尝试使用sbt创建一个胖jar会出现如下错误:
java.lang.RuntimeException: deduplicate: different file contents found in the following:
C:\Users\db\.ivy2\cache\org.apache.spark\spark-network-common_2.10\jars\spark-network-common_2.10-1.6.3.jar:com/google/common/base/Function.class
C:\Users\db\.ivy2\cache\com.google.guava\guava\bundles\guava-14.0.1.jar:com/google/common/base/Function.class
有很多类,这只是一个例子。 Guava 14.0.1是两个jar中的Function.class的版本:
[info] +-com.google.guava:guava:14.0.1
...
[info] | | +-com.google.guava:guava:14.0.1
这意味着sbt / ivy不会选择一个作为新版本,但是罐子里的尺寸和日期不同,这可能导致上述错误:
$ jar tvf /c/Users/db/.ivy2/cache/org.apache.spark/spark-network-common_2.10/jars/spark-network-common_2.10-1.6.3.jar | grep "com/google/common/base/Function.class"
549 Wed Nov 02 16:03:20 CDT 2016 com/google/common/base/Function.class
$ jar tvf /c/Users/db/.ivy2/cache/com.google.guava/guava/bundles/guava-14.0.1.jar | grep "com/google/common/base/Function.class"
543 Thu Mar 14 19:56:52 CDT 2013 com/google/common/base/Function.class
似乎Apache正在从源代码重新编译Function.class
,而不是将该类包含在最初编译的类中。
这是对这里发生的事情的正确理解吗?现在,可以使用sbt排除重新编译的类,但是存在
一种构建jar的方法,而不是明确地排除每个包含重新编译的源名称的jar?不包括罐子明确导致一些事情
沿着下面片段的线条,这似乎让我走错了道路:
libraryDependencies += "org.apache.spark" %% "spark-core" % "1.6.3"
excludeAll(
ExclusionRule(organization = "com.twitter"),
ExclusionRule(organization = "org.apache.spark", name = "spark-network-common_2.10"),
ExclusionRule(organization = "org.apache.hadoop", name = "hadoop-client"),
ExclusionRule(organization = "org.apache.hadoop", name = "hadoop-hdfs"),
ExclusionRule(organization = "org.tachyonproject", name = "tachyon-client"),
ExclusionRule(organization = "commons-beanutils", name = "commons-beanutils"),
ExclusionRule(organization = "commons-collections", name = "commons-collections"),
ExclusionRule(organization = "org.apache.hadoop", name = "hadoop-yarn-api"),
ExclusionRule(organization = "org.apache.hadoop", name = "hadoop-yarn-common"),
ExclusionRule(organization = "org.apache.curator", name = "curator-recipes")
)
,
libraryDependencies += "org.apache.spark" %% "spark-network-common" % "1.6.3" exclude("com.google.guava", "guava"),
libraryDependencies += "org.apache.spark" %% "spark-graphx" % "1.6.3",
libraryDependencies += "com.typesafe.scala-logging" %% "scala-logging-slf4j" % "2.1.2",
libraryDependencies += "org.apache.hadoop" % "hadoop-client" % "2.2.0" exclude("com.google.guava", "guava"),
libraryDependencies += "com.google.guava" % "guava" % "14.0.1",
libraryDependencies += "org.json4s" %% "json4s-native" % "3.2.11",
libraryDependencies += "org.json4s" %% "json4s-ext" % "3.2.11",
libraryDependencies += "com.rabbitmq" % "amqp-client" % "4.1.1",
libraryDependencies += "commons-codec" % "commons-codec" % "1.10",
如果这是错误的道路,那么更清洁的方式是什么?
答案 0 :(得分:1)
如果这是错误的道路,那么更清洁的方式是什么?
更干净的方法是根本不打包spark-core
,当你在目标机器上安装Spark时它可用,并且在运行时可用于你的应用程序(你通常可以在/usr/lib/spark/jars
)。
您应该将这些火花依赖关系标记为% provided
。这可以帮助您避免包装这些罐子造成的许多冲突。