Spark 2 sbt程序集重复数据删除错误Scala 2.11.8

时间:2017-01-30 23:07:52

标签: scala apache-spark postgis

我正在尝试构建一个超级jar,所以我可以部署我的Spark程序:

执行命令

sbt assembly

这会输出很多错误:

[error] deduplicate: different file contents found in the following:
[error] /Users/samibadawi/.ivy2/cache/commons-collections/commons-collections/jars/commons-collections-3.2.1.jar:org/apache/commons/collections/FastHashMap$CollectionView$CollectionViewIterator.class
[error] /Users/samibadawi/.ivy2/cache/commons-beanutils/commons-beanutils/jars/commons-beanutils-1.7.0.jar:org/apache/commons/collections/FastHashMap$CollectionView$CollectionViewIterator.class

有关Scala 2.10的问题的答案不起作用: spark + sbt-assembly: "deduplicate: different file contents found in the following"

经过多次黑客攻击后,我得到了一个hello world项目,没有任何有用的代码可以使用下面的build.sbt文件进行编译:

似乎是随机的,排除什么和合并策略。有没有更简单更系统的方法来做到这一点?

(除了使用: " org.apache.spark" %%" spark-core" %sparkVersion%"提供", 在这种情况下,没有部署依赖项。)

build.sbt摘录:

import sbtassembly.AssemblyPlugin._

//Define dependencies. These ones are only required for Test and Integration Test scopes.
libraryDependencies ++= Seq(
  ("org.apache.spark" %% "spark-core" % sparkVersion).
    exclude("commons-beanutils", "commons-beanutils-core").
    exclude("commons-collections", "commons-collections").
    exclude("commons-logging", "commons-logging").
    exclude("com.esotericsoftware.minlog", "minlog").
    exclude("com.codahale.metrics", "metrics-core").
    exclude("aopalliance","aopalliance")
    ,
  "org.scalatest"   %% "scalatest"    % "2.2.4"   % "test,it"
)

mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) =>
  {
    case PathList("javax", "servlet", xs @ _*) => MergeStrategy.last
    case PathList("javax", "inject", xs @ _*) => MergeStrategy.last
    case PathList("javax", "activation", xs @ _*) => MergeStrategy.last
    case PathList("org", "apache", xs @ _*) => MergeStrategy.last
    case PathList("com", "google", xs @ _*) => MergeStrategy.last
    case PathList("com", "esotericsoftware", xs @ _*) => MergeStrategy.last
    case PathList("com", "codahale", xs @ _*) => MergeStrategy.last
    case PathList("com", "yammer", xs @ _*) => MergeStrategy.last
    case "about.html" => MergeStrategy.rename
    case "META-INF/ECLIPSEF.RSA" => MergeStrategy.last
    case "META-INF/mailcap" => MergeStrategy.last
    case "META-INF/mimetypes.default" => MergeStrategy.last
    case "plugin.properties" => MergeStrategy.last
    case "log4j.properties" => MergeStrategy.last
    case x => old(x)
  }
}

Project.inConfig(Test)(assemblySettings)

1 个答案:

答案 0 :(得分:0)

多做了一点错误并制作了一个适用于我的真实程序的build.sbt:

我遇到的一个问题是Postgres的jar版本重复问题。 我通过评论这些依赖关系来解决这个问题:

//  "org.postgresql" % "postgresql" % "9.4.1212", //Small gap between Doobie and Spark dependency
//  "org.postgis" % "postgis-jdbc" % "1.3.3", //Causes conflicts

我还没有开始使用PostGIS,它依赖于postgresql-8.3-603.jdbc4.jar

我不得不把Postgres直接依赖。

从工作build.sbt:

    val doobieVersion = "0.4.1"

libraryDependencies ++= Seq(
  "ch.qos.logback" % "logback-classic" % "1.0.13", //comment and warning go away
  "ch.qos.logback" % "logback-core" % "1.0.13",
  "com.citymaps" % "tile-library" % "1.4",
  "com.fasterxml.jackson.module" %% "jackson-module-scala" % "2.7.2",
  "com.github.scopt" %% "scopt" % "3.5.0",
  "com.typesafe.play" %% "play-json" % "2.5.9",
  "org.apache.spark" %% "spark-core" % sparkVersion  % "provided",
  "org.apache.spark" %% "spark-sql" % sparkVersion % "provided",
  "org.apache.spark" %% "spark-mllib" % sparkVersion % "provided",
  "graphframes" % "graphframes" % "0.3.0-spark2.0-s_2.11",
  "org.clapper" %% "grizzled-slf4j" % "1.3.0",
//  "org.postgresql" % "postgresql" % "9.4.1212", //Small gap between Doobie and Spark dependency
//  "org.postgis" % "postgis-jdbc" % "1.3.3", //Causes conflicts
  "org.scalatest" %% "scalatest" % "3.0.0" % "test" withSources() withJavadoc(),
  "org.spire-math" %% "spire" % "0.11.0",
  "org.tpolecat" %% "doobie-core-cats" % doobieVersion,
  "org.tpolecat" %% "doobie-postgres-cats"   % doobieVersion
)

运行后

sbt clean

这停止了工作。 事实证明postgis-jdbc存在冲突,最后一个版本是2.2.1,但普通Maven存储库上可用的最后一个版本是1.3.3,并且依赖于旧的Postgres驱动程序jar。

看了很多回购,找不到postgis-jdbc 2.2.1。

下载2.2.1版本 https://github.com/postgis/postgis-java

此版本的版本设置为2.2.2SNAPSHOT。因此,请更改pom.xml和jdbc / pom.xml中的版本号

使用此命令构建jar。关于Maven版本很挑剔:

/usr/local/Cellar/maven/3.3.9/bin/mvn install

现在包含此依赖项

resolvers ++= Seq(
    Resolver.mavenLocal

"net.postgis" % "postgis-jdbc" % "2.2.1",

然后运行

sbt assembly

终于奏效了。