Spark2.2.1兼容Jackson版本2.8.8

时间:2017-12-23 10:58:58

标签: java eclipse scala maven apache-spark

我的配置是:

  • Scala 2.11(插件Scala IDE)
  • Eclipse Neon.3发布(4.6.3)
  • Windows 7 64位

我想运行这个简单的scala代码(Esempio.scala):

package it.scala

// importo packages di Spark
import org.apache.spark.SparkContext
import org.apache.spark.SparkConf


object Wordcount {
    def main(args: Array[String]) {

        val inputs: Array[String] = new Array[String](2)
        inputs(0) = "C:\\Users\\FobiDell\\Desktop\\input"
        inputs(1) = "C:\\Users\\FobiDell\\Desktop\\output"

        // oggetto SparkConf per settare i parametri sulla propria applicazione 
        // da fornire poi al cluster manager scelto (Yarn, Mesos o Standalone).
        val conf = new SparkConf()
        conf.setAppName("Smartphone Addiction")
        conf.setMaster("local")

        // oggetto SparkContext per connessione al cluster manager scelto
        val sc = new SparkContext(conf)

        //Read file and create RDD
        val rawData = sc.textFile(inputs(0))

        //convert the lines into words using flatMap operation
        val words = rawData.flatMap(line => line.split(" "))

        //count the individual words using map and reduceByKey operation
        val wordCount = words.map(word => (word, 1)).reduceByKey(_ + _)

        //Save the result
        wordCount.saveAsTextFile(inputs(1))

       //stop the spark context
       sc.stop

   }

}

所以,如果我使用Spark-shell,那么从Eclipse IDE开始,如果我选择文件(Esempio.scala)并通过Run-> Run as-> Scala应用程序运行它,一切都会好的,我得到了这个例外:

Exception in thread "main" java.lang.ExceptionInInitializerError
    at org.apache.spark.SparkContext.withScope(SparkContext.scala:701)
    at org.apache.spark.SparkContext.textFile(SparkContext.scala:830)
    at it.scala.Wordcount$.main(Esempio.scala:47)
    at it.scala.Wordcount.main(Esempio.scala)
Caused by: com.fasterxml.jackson.databind.JsonMappingException: Incompatible Jackson version: 2.8.8
    at com.fasterxml.jackson.module.scala.JacksonModule$class.setupModule(JacksonModule.scala:64)
    at com.fasterxml.jackson.module.scala.DefaultScalaModule.setupModule(DefaultScalaModule.scala:19)
    at com.fasterxml.jackson.databind.ObjectMapper.registerModule(ObjectMapper.java:745)
    at org.apache.spark.rdd.RDDOperationScope$.<init>(RDDOperationScope.scala:82)
    at org.apache.spark.rdd.RDDOperationScope$.<clinit>(RDDOperationScope.scala)
    ... 4 more  

我的pom.xml文件是:

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
  <modelVersion>4.0.0</modelVersion>

  <groupId>it.hgfhgf.xhgfghf</groupId>
  <artifactId>progetto</artifactId>
  <version>0.0.1-SNAPSHOT</version>
  <packaging>jar</packaging>

  <name>progetto</name>
  <url>http://maven.apache.org</url>

  <properties>
     <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
  </properties>

  <dependencies>
    <dependency>
      <groupId>junit</groupId>
      <artifactId>junit</artifactId>
      <version>3.8.1</version>
      <scope>test</scope>
    </dependency>

    <!-- Neo4j JDBC DRIVER -->
    <dependency>
      <groupId>org.neo4j</groupId>
      <artifactId>neo4j-jdbc-driver</artifactId>
      <version>3.1.0</version>
    </dependency>

    <!-- Scala -->
    <dependency>
      <groupId>org.scala-lang</groupId>
      <artifactId>scala-library</artifactId>
      <version>2.11.11</version>
    </dependency> 

    <!-- Spark -->
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-core_2.11</artifactId>
      <version>2.2.1</version>
    </dependency>


  </dependencies>


</project>

我注意到spark-2.2.1-bin-hadoop2.7 / jars目录中的.jar文件是:

  • 杰克逊 - 芯 - 2.6.5.jar
  • 杰克逊 - 数据绑定-2.6.5.jar
  • 杰克逊模块-paranamer-2.6.5.jar
  • 杰克逊模块-scala_2.11-2.6.5.jar
  • 杰克逊 - 注解-2.6.5.jar

任何人都可以用简单的语言向我解释这个例外是什么以及它如何解决?

5 个答案:

答案 0 :(得分:10)

Spark 2.x包含jackson 2.6.5neo4j-jdbc-driver使用jackson 2.8.8版本,这里是两个不同版本的jackson库之间的依赖冲突。 这就是您收到Incompatible Jackson version: 2.8.8错误的原因。

尝试覆盖pom.xml中这些[下面]模块的依赖项版本,看看是否有效,

  1. 杰克逊核
  2. 杰克逊数据绑定
  3. 杰克逊模块-scala_2.x
  4. 或尝试将以下依赖项添加到您的pom.xml

            <dependency>
                <groupId>com.fasterxml.jackson.module</groupId>
                <artifactId>jackson-module-scala_2.11</artifactId>
                <version>2.8.8</version>
            </dependency> 
    

答案 1 :(得分:10)

不确定这是否对使用scala 2.12的sbt项目有问题的人有所帮助。放入 jackson-module-scala_2.11 效果不佳。仅有一个版本的jackson-module-scala 2.6.7具有一个scala 2.12版本

build.sbt中的以下行有效

def train_test_split(*arrays, test_size=None, train_size=None, ...): 
    # ...

这解决了spark 2.4.5的问题

答案 2 :(得分:2)

我确实碰到了杰克逊的同一版本冲突。除了覆盖jackson-core,jackson-databind,jackson-module-scala_2.x之外,我还在我的pom.xml中定义了jackson-annotations,它解决了冲突。

答案 3 :(得分:2)

Scala 2.1.1版可与Jackson 2.6.5一起使用。使用以下内容:

    <dependency>
        <groupId>com.fasterxml.jackson.core</groupId>
        <artifactId>jackson-databind</artifactId>
        <version>2.6.5</version>
    </dependency>

答案 4 :(得分:0)

Below is the combination that worked for me .

 aws-java-sdk-1.7.4.jar
 hadoop-aws-2.7.3.jar
 joda-time-2.9.6.jar
 hadoop-client-2.7.3-sources.jar
 hadoop-client-2.7.3.jar
 hadoop-client-2.6.0-javadoc.jar
 hadoop-client-2.6.0.jar
 jets3t-0.9.4.jar
 jackson-core-2.10.0.jar
 jackson-databind-2.8.6.jar
 jackson-module-scala_2.11-2.8.5.jar
 jackson-annotations-2.8.7.jar