尝试提交Spark Kafka作业(带有一个胖子),但程序集出现重复数据删除问题

时间:2018-10-29 08:46:25

标签: scala apache-spark apache-kafka spark-streaming

我正在开发一个Scala Spark Streaming应用程序,以从Kafka接收数据。由于存在一些依赖关系问题,因此我试图创建一个胖子罐。但是,当在sbt中执行组装任务时,会遇到很多“重复数据删除问题”。我的build.sbt文件是:

name := "Streaming Kafka"
version := "1.0"
scalaVersion := "2.11.8"

libraryDependencies += "org.apache.spark" %% "spark-core" % "2.3.0"
libraryDependencies += "org.apache.spark" %% "spark-streaming" % "2.3.0"
libraryDependencies += "org.apache.spark" %% "spark-streaming-kafka-0-10" % "2.3.0"

assemblyMergeStrategy in assembly := {
  case PathList("org","aopalliance", xs @ _*) => MergeStrategy.last
  case PathList("javax", "inject", xs @ _*) => MergeStrategy.last
  case PathList("javax", "servlet", xs @ _*) => MergeStrategy.last
  case PathList("javax", "activation", xs @ _*) => MergeStrategy.last
  case PathList("org", "apache", xs @ _*) => MergeStrategy.last
  case PathList("com", "google", xs @ _*) => MergeStrategy.last
  case PathList("com", "esotericsoftware", xs @ _*) => MergeStrategy.last
  case PathList("com", "codahale", xs @ _*) => MergeStrategy.last
  case PathList("com", "yammer", xs @ _*) => MergeStrategy.last
  case "about.html" => MergeStrategy.rename
  case "META-INF/ECLIPSEF.RSA" => MergeStrategy.last
  case "META-INF/mailcap" => MergeStrategy.last
  case "META-INF/mimetypes.default" => MergeStrategy.last
  case "plugin.properties" => MergeStrategy.last
  case "log4j.properties" => MergeStrategy.last
  case x =>
    val oldStrategy = (assemblyMergeStrategy in assembly).value
    oldStrategy(x)
}

assembly.sbt:

addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.14.5")

我得到的部分日志:(这只是摘录,因为完整的日志太长了)

 [[[31merror]111 errors were encountered during merge
[[[31merror]java.lang.RuntimeException: deduplicate: different file contents found in the following:
[[[31merror]/usr/local/spark/spark-2.3.0-bin-hadoop2.7/jars/arrow-vector-0.8.0.jar:codegen/config.fmpp
[[[31merror]/usr/local/spark/spark-2.3.0-bin-hadoop2.7/jars/calcite-core-1.2.0-incubating.jar:codegen/config.fmpp
[[[31merror]deduplicate: different file contents found in the following:
[[[31merror]/usr/local/spark/spark-2.3.0-bin-hadoop2.7/jars/arrow-format-0.8.0.jar:git.properties
[[[31merror]/usr/local/spark/spark-2.3.0-bin-hadoop2.7/jars/arrow-vector-0.8.0.jar:git.properties
[[[31merror]/usr/local/spark/spark-2.3.0-bin-hadoop2.7/jars/arrow-memory-0.8.0.jar:git.properties
[[[31merror]deduplicate: different file contents found in the following:
[[[31merror]/usr/local/spark/spark-2.3.0-bin-hadoop2.7/jars/stax-api-1.0.1.jar:javax/xml/stream/EventFilter.class
[[[31merror]/usr/local/spark/spark-2.3.0-bin-hadoop2.7/jars/stax-api-1.0-2.jar:javax/xml/stream/EventFilter.class
[[[31merror]deduplicate: different file contents found in the following:
[[[31merror]/usr/local/spark/spark-2.3.0-bin-hadoop2.7/jars/stax-api-1.0.1.jar:javax/xml/stream/FactoryConfigurationError.class
[[[31merror]/usr/local/spark/spark-2.3.0-bin-hadoop2.7/jars/stax-api-1.0-2.jar:javax/xml/stream/FactoryConfigurationError.class
[[[31merror]deduplicate: different file contents found in the following:
[[[31merror]/usr/local/spark/spark-2.3.0-bin-hadoop2.7/jars/stax-api-1.0.1.jar:javax/xml/stream/FactoryFinder$1.class
[[[31merror]/usr/local/spark/spark-2.3.0-bin-hadoop2.7/jars/stax-api-1.0-2.jar:javax/xml/stream/FactoryFinder$1.class
[[[31merror]deduplicate: different file contents found in the following:
[[[31merror]/usr/local/spark/spark-2.3.0-bin-hadoop2.7/jars/stax-api-1.0.1.jar:javax/xml/stream/FactoryFinder$ClassLoaderFinder.class
[[[31merror]/usr/local/spark/spark-2.3.0-bin-hadoop2.7/jars/stax-api-1.0-2.jar:javax/xml/stream/FactoryFinder$ClassLoaderFinder.class
[[[31merror]deduplicate: different file contents found in the following:
[[[31merror]/usr/local/spark/spark-2.3.0-bin-hadoop2.7/jars/stax-api-1.0.1.jar:javax/xml/stream/FactoryFinder$ClassLoaderFinderConcrete.class
[[[31merror]/usr/local/spark/spark-2.3.0-bin-hadoop2.7/jars/stax-api-1.0-2.jar:javax/xml/stream/FactoryFinder$ClassLoaderFinderConcrete.class
[[[31merror]deduplicate: different file contents found in the following:
[[[31merror]/usr/local/spark/spark-2.3.0-bin-hadoop2.7/jars/stax-api-1.0.1.jar:javax/xml/stream/FactoryFinder.class
[[[31merror]/usr/local/spark/spark-2.3.0-bin-hadoop2.7/jars/stax-api-1.0-2.jar:javax/xml/stream/FactoryFinder.class
[[[31merror]deduplicate: different file contents found in the following:
[[[31merror]/usr/local/spark/spark-2.3.0-bin-hadoop2.7/jars/stax-api-1.0.1.jar:javax/xml/stream/Location.class
[[[31merror]/usr/local/spark/spark-2.3.0-bin-hadoop2.7/jars/stax-api-1.0-2.jar:javax/xml/stream/Location.class
[[[31merror]deduplicate: different file contents found in the following:
[[[31merror]/usr/local/spark/spark-2.3.0-bin-hadoop2.7/jars/stax-api-1.0.1.jar:javax/xml/stream/StreamFilter.class
[[[31merror]/usr/local/spark/spark-2.3.0-bin-hadoop2.7/jars/stax-api-1.0-2.jar:javax/xml/stream/StreamFilter.class
[[[31merror]deduplicate: different file contents found in the following:
[[[31merror]/usr/local/spark/spark-2.3.0-bin-hadoop2.7/jars/stax-api-1.0.1.jar:javax/xml/stream/XMLEventFactory.class
[[[31merror]/usr/local/spark/spark-2.3.0-bin-hadoop2.7/jars/stax-api-1.0-2.jar:javax/xml/stream/XMLEventFactory.class
[[[31merror]deduplicate: different file contents found in the following:
[[[31merror]/usr/local/spark/spark-2.3.0-bin-hadoop2.7/jars/stax-api-1.0.1.jar:javax/xml/stream/XMLEventReader.class
[[[31merror]/usr/local/spark/spark-2.3.0-bin-hadoop2.7/jars/stax-api-1.0-2.jar:javax/xml/stream/XMLEventReader.class
[[[31merror]deduplicate: different file contents found in the following:
[[[31merror]/usr/local/spark/spark-2.3.0-bin-hadoop2.7/jars/stax-api-1.0.1.jar:javax/xml/stream/XMLEventWriter.class
[[[31merror]/usr/local/spark/spark-2.3.0-bin-hadoop2.7/jars/stax-api-1.0-2.jar:javax/xml/stream/XMLEventWriter.class
[[[31merror]deduplicate: different file contents found in the following:
[[[31merror]/usr/local/spark/spark-2.3.0-bin-hadoop2.7/jars/stax-api-1.0.1.jar:javax/xml/stream/XMLInputFactory.class
[[[31merror]/usr/local/spark/spark-2.3.0-bin-hadoop2.7/jars/stax-api-1.0-2.jar:javax/xml/stream/XMLInputFactory.class
[[[31merror]deduplicate: different file contents found in the following:
[[[31merror]/usr/local/spark/spark-2.3.0-bin-hadoop2.7/jars/stax-api-1.0.1.jar:javax/xml/stream/XMLOutputFactory.class
[[[31merror]/usr/local/spark/spark-2.3.0-bin-hadoop2.7/jars/stax-api-1.0-2.jar:javax/xml/stream/XMLOutputFactory.class
[[[31merror]deduplicate: different file contents found in the following:
[[[31merror]/usr/local/spark/spark-2.3.0-bin-hadoop2.7/jars/stax-api-1.0.1.jar:javax/xml/stream/XMLReporter.class
[[[31merror]/usr/local/spark/spark-2.3.0-bin-hadoop2.7/jars/stax-api-1.0-2.jar:javax/xml/stream/XMLReporter.class
[[[31merror]deduplicate: different file contents found in the following:
[[[31merror]/usr/local/spark/spark-2.3.0-bin-hadoop2.7/jars/stax-api-1.0.1.jar:javax/xml/stream/XMLResolver.class
[[[31merror]/usr/local/spark/spark-2.3.0-bin-hadoop2.7/jars/stax-api-1.0-2.jar:javax/xml/stream/XMLResolver.class
[[[31merror]deduplicate: different file contents found in the following:
[[[31merror]/usr/local/spark/spark-2.3.0-bin-hadoop2.7/jars/stax-api-1.0.1.jar:javax/xml/stream/XMLStreamConstants.class
[[[31merror]/usr/local/spark/spark-2.3.0-bin-hadoop2.7/jars/stax-api-1.0-2.jar:javax/xml/stream/XMLStreamConstants.class
[[[31merror]deduplicate: different file contents found in the following:
[[[31merror]/usr/local/spark/spark-2.3.0-bin-hadoop2.7/jars/stax-api-1.0.1.jar:javax/xml/stream/XMLStreamException.class
[[[31merror]/usr/local/spark/spark-2.3.0-bin-hadoop2.7/jars/stax-api-1.0-2.jar:javax/xml/stream/XMLStreamException.class
[[[31merror]deduplicate: different file contents found in the following:
[[[31merror]/usr/local/spark/spark-2.3.0-bin-hadoop2.7/jars/stax-api-1.0.1.jar:javax/xml/stream/XMLStreamReader.class
[[[31merror]/usr/local/spark/spark-2.3.0-bin-hadoop2.7/jars/stax-api-1.0-2.jar:javax/xml/stream/XMLStreamReader.class
[[[31merror]deduplicate: different file contents found in the following:
[[[31merror]/usr/local/spark/spark-2.3.0-bin-hadoop2.7/jars/stax-api-1.0.1.jar:javax/xml/stream/XMLStreamWriter.class
[[[31merror]/usr/local/spark/spark-2.3.0-bin-hadoop2.7/jars/stax-api-1.0-2.jar:javax/xml/stream/XMLStreamWriter.class
[[[31merror]deduplicate: different file contents found in the following:
[[[31merror]/usr/local/spark/spark-2.3.0-bin-hadoop2.7/jars/stax-api-1.0.1.jar:javax/xml/stream/events/Attribute.class
[[[31merror]/usr/local/spark/spark-2.3.0-bin-hadoop2.7/jars/stax-api-1.0-2.jar:javax/xml/stream/events/Attribute.class
[[[31merror]deduplicate: different file contents found in the following:
[[[31merror]/usr/local/spark/spark-2.3.0-bin-hadoop2.7/jars/stax-api-1.0.1.jar:javax/xml/stream/events/Characters.class
[[[31merror]/usr/local/spark/spark-2.3.0-bin-hadoop2.7/jars/stax-api-1.0-2.jar:javax/xml/stream/events/Characters.class
[[[31merror]deduplicate: different file contents found in the following:
[[[31merror]/usr/local/spark/spark-2.3.0-bin-hadoop2.7/jars/stax-api-1.0.1.jar:javax/xml/stream/events/Comment.class
[[[31merror]/usr/local/spark/spark-2.3.0-bin-hadoop2.7/jars/stax-api-1.0-2.jar:javax/xml/stream/events/Comment.class
[[[31merror]deduplicate: different file contents found in the following:
[[[31merror]/usr/local/spark/spark-2.3.0-bin-hadoop2.7/jars/stax-api-1.0.1.jar:javax/xml/stream/events/DTD.class
[[[31merror]/usr/local/spark/spark-2.3.0-bin-hadoop2.7/jars/stax-api-1.0-2.jar:javax/xml/stream/events/DTD.class
[[[31merror]deduplicate: different file contents found in the following:
[[[31merror]/usr/local/spark/spark-2.3.0-bin-hadoop2.7/jars/stax-api-1.0.1.jar:javax/xml/stream/events/EndDocument.class
[[[31merror]/usr/local/spark/spark-2.3.0-bin-hadoop2.7/jars/stax-api-1.0-2.jar:javax/xml/stream/events/EndDocument.class
[[[31merror]deduplicate: different file contents found in the following:
[[[31merror]/usr/local/spark/spark-2.3.0-bin-hadoop2.7/jars/stax-api-1.0.1.jar:javax/xml/stream/events/EndElement.class
[[[31merror]/usr/local/spark/spark-2.3.0-bin-hadoop2.7/jars/stax-api-1.0-2.jar:javax/xml/stream/events/EndElement.class
[[[31merror]deduplicate: different file contents found in the following:
[[[31merror]/usr/local/spark/spark-2.3.0-bin-hadoop2.7/jars/stax-api-1.0.1.jar:javax/xml/stream/events/EntityDeclaration.class
[[[31merror]/usr/local/spark/spark-2.3.0-bin-hadoop2.7/jars/stax-api-1.0-2.jar:javax/xml/stream/events/EntityDeclaration.class
[[[31merror]deduplicate: different file contents found in the following:
[[[31merror]/usr/local/spark/spark-2.3.0-bin-hadoop2.7/jars/stax-api-1.0.1.jar:javax/xml/stream/events/EntityReference.class
[[[31merror]/usr/local/spark/spark-2.3.0-bin-hadoop2.7/jars/stax-api-1.0-2.jar:javax/xml/stream/events/EntityReference.class
[[[31merror]deduplicate: different file contents found in the following:
[[[31merror]/usr/local/spark/spark-2.3.0-bin-hadoop2.7/jars/stax-api-1.0.1.jar:javax/xml/stream/events/Namespace.class
[[[31merror]/usr/local/spark/spark-2.3.0-bin-hadoop2.7/jars/stax-api-1.0-2.jar:javax/xml/stream/events/Namespace.class
[[[31merror]deduplicate: different file contents found in the following:
[[[31merror]/usr/local/spark/spark-2.3.0-bin-hadoop2.7/jars/stax-api-1.0.1.jar:javax/xml/stream/events/NotationDeclaration.class
[[[31merror]/usr/local/spark/spark-2.3.0-bin-hadoop2.7/jars/stax-api-1.0-2.jar:javax/xml/stream/events/NotationDeclaration.class
[[[31merror]deduplicate: different file contents found in the following:
[[[31merror]/usr/local/spark/spark-2.3.0-bin-hadoop2.7/jars/stax-api-1.0.1.jar:javax/xml/stream/events/ProcessingInstruction.class
[[[31merror]/usr/local/spark/spark-2.3.0-bin-hadoop2.7/jars/stax-api-1.0-2.jar:javax/xml/stream/events/ProcessingInstruction.class
[[[31merror]deduplicate: different file contents found in the following:
[[[31merror]/usr/local/spark/spark-2.3.0-bin-hadoop2.7/jars/stax-api-1.0.1.jar:javax/xml/stream/events/StartDocument.class
[[[31merror]/usr/local/spark/spark-2.3.0-bin-hadoop2.7/jars/stax-api-1.0-2.jar:javax/xml/stream/events/StartDocument.class
[[[31merror]deduplicate: different file contents found in the following:
[[[31merror]/usr/local/spark/spark-2.3.0-bin-hadoop2.7/jars/stax-api-1.0.1.jar:javax/xml/stream/events/StartElement.class
[[[31merror]/usr/local/spark/spark-2.3.0-bin-hadoop2.7/jars/stax-api-1.0-2.jar:javax/xml/stream/events/StartElement.class
[[[31merror]deduplicate: different file contents found in the following:
[[[31merror]/usr/local/spark/spark-2.3.0-bin-hadoop2.7/jars/stax-api-1.0.1.jar:javax/xml/stream/events/XMLEvent.class
[[[31merror]/usr/local/spark/spark-2.3.0-bin-hadoop2.7/jars/stax-api-1.0-2.jar:javax/xml/stream/events/XMLEvent.class
[[[31merror]deduplicate: different file contents found in the following:
[[[31merror]/usr/local/spark/spark-2.3.0-bin-hadoop2.7/jars/stax-api-1.0.1.jar:javax/xml/stream/util/EventReaderDelegate.class
[[[31merror]/usr/local/spark/spark-2.3.0-bin-hadoop2.7/jars/stax-api-1.0-2.jar:javax/xml/stream/util/EventReaderDelegate.class
[[[31merror]deduplicate: different file contents found in the following:
[[[31merror]/usr/local/spark/spark-2.3.0-bin-hadoop2.7/jars/stax-api-1.0.1.jar:javax/xml/stream/util/StreamReaderDelegate.class
[[[31merror]/usr/local/spark/spark-2.3.0-bin-hadoop2.7/jars/stax-api-1.0-2.jar:javax/xml/stream/util/StreamReaderDelegate.class
[[[31merror]deduplicate: different file contents found in the following:
[[[31merror]/usr/local/spark/spark-2.3.0-bin-hadoop2.7/jars/stax-api-1.0.1.jar:javax/xml/stream/util/XMLEventAllocator.class
[[[31merror]/usr/local/spark/spark-2.3.0-bin-hadoop2.7/jars/stax-api-1.0-2.jar:javax/xml/stream/util/XMLEventAllocator.class
[[[31merror]deduplicate: different file contents found in the following:
[[[31merror]/usr/local/spark/spark-2.3.0-bin-hadoop2.7/jars/stax-api-1.0.1.jar:javax/xml/stream/util/XMLEventConsumer.class
[[[31merror]/usr/local/spark/spark-2.3.0-bin-hadoop2.7/jars/stax-api-1.0-2.jar:javax/xml/stream/util/XMLEventConsumer.class
[[[31merror]deduplicate: different file contents found in the following:



[[[31merror]Total time: 286 s, completed 28-oct-2018 18:33:01

0 个答案:

没有答案