将Spark Connector用于CosmosDB时,杰克逊版本不兼容

时间:2019-02-20 17:30:58

标签: scala apache-spark apache-spark-sql azure-cosmosdb apache-spark-dataset

当我尝试使用Scala中的cosmosdb连接器从SparkSql数据集读取时,出现异常。我在本地模式下运行Spark。

我的sbt依赖项如下:

libraryDependencies ++= Seq(
  "org.apache.spark" %% "spark-sql" % "2.3.1" % Provided withSources() withJavadoc(),
  "com.microsoft.azure"  % "azure-cosmosdb-spark_2.3.0_2.11" % "1.3.3" % Provided withSources() withJavadoc(),
  "org.specs2" %% "specs2-core" % "4.3.4" % Test withSources() withJavadoc()
)

我正在运行的代码如下:

import com.microsoft.azure.cosmosdb.spark.schema._
import com.microsoft.azure.cosmosdb.spark.config.Config
import org.apache.spark.sql.catalyst.ScalaReflection
import org.apache.spark.sql.types.StructType
import org.apache.spark.sql.{Dataset, SparkSession}
import org.specs2.mutable.Specification

val spark: SparkSession = SparkSession.builder.master("local").getOrCreate()
val schema: StructType = ScalaReflection.schemaFor[MyCaseClass].dataType.asInstanceOf[StructType]

val configMap: Config = Config(Map(
    "Endpoint" -> "https://mydocdb.documents.azure.com:443/",
    "Masterkey" -> "supersecretvalue==",
    "Database" -> "mydatabase",
    "Collection" -> "mycollection"))

import spark.implicits._
val data: Dataset[MyCaseClass] = spark.read.schema(schema).cosmosDB(configMap).as[MyCaseClass]

println(data.count()) // Boom!

我得到的例外是:

Caused by: com.fasterxml.jackson.databind.JsonMappingException: Incompatible Jackson version: 2.9.5

com.fasterxml.jackson.databind正在对某些版本进行运行时检查,但不喜欢。

我打印了我的库的依赖关系树并得到了:

[info] com.custom.library:mylib_2.11:na [S]
[info]   +-com.microsoft.azure:azure-cosmosdb-spark_2.3.0_2.11:1.3.3 [S]
[info]   | +-com.fasterxml.jackson.core:jackson-databind:2.9.5
[info]   | | +-com.fasterxml.jackson.core:jackson-annotations:2.9.0
[info]   | | +-com.fasterxml.jackson.core:jackson-core:2.9.5
...
[info]   +-org.apache.spark:spark-sql_2.11:2.3.1
[info]     +-com.fasterxml.jackson.core:jackson-databind:2.6.7.1 (evicted by: 2.9.5)
[info]     | +-com.fasterxml.jackson.core:jackson-annotations:2.6.0 (evicted by: 2.9.0)
[info]     | +-com.fasterxml.jackson.core:jackson-annotations:2.6.7 (evicted by: 2.9.0)
[info]     | +-com.fasterxml.jackson.core:jackson-annotations:2.9.0
[info]     | +-com.fasterxml.jackson.core:jackson-core:2.6.7 (evicted by: 2.9.5)
[info]     | +-com.fasterxml.jackson.core:jackson-core:2.7.9 (evicted by: 2.9.5)
[info]     | +-com.fasterxml.jackson.core:jackson-core:2.9.5

完整的堆栈跟踪为:

java.lang.ExceptionInInitializerError
        at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
        at org.apache.spark.rdd.RDD.map(RDD.scala:370)
        at com.microsoft.azure.cosmosdb.spark.schema.CosmosDBRowConverter$.asRow(CosmosDBRowConverter.scala:69)
        at com.microsoft.azure.cosmosdb.spark.schema.CosmosDBRelation.buildScan(CosmosDBRelation.scala:78)
        at org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$10.apply(DataSourceStrategy.scala:293)
        at org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$10.apply(DataSourceStrategy.scala:293)
        at org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$pruneFilterProject$1.apply(DataSourceStrategy.scala:338)
        at org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$pruneFilterProject$1.apply(DataSourceStrategy.scala:337)
        at org.apache.spark.sql.execution.datasources.DataSourceStrategy.pruneFilterProjectRaw(DataSourceStrategy.scala:393)
        at org.apache.spark.sql.execution.datasources.DataSourceStrategy.pruneFilterProject(DataSourceStrategy.scala:333)
        at org.apache.spark.sql.execution.datasources.DataSourceStrategy.apply(DataSourceStrategy.scala:289)
        at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:63)
        at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:63)
        at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
        at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
        at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)
        at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:93)
        at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2$$anonfun$apply$2.apply(QueryPlanner.scala:78)
        at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2$$anonfun$apply$2.apply(QueryPlanner.scala:75)
        at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
        at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
        at scala.collection.Iterator$class.foreach(Iterator.scala:893)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
        at scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:157)
        at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1336)
        at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2.apply(QueryPlanner.scala:75)
        at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2.apply(QueryPlanner.scala:67)
        at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
        at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
        at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:93)
        at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2$$anonfun$apply$2.apply(QueryPlanner.scala:78)
        at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2$$anonfun$apply$2.apply(QueryPlanner.scala:75)
        at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
        at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
        at scala.collection.Iterator$class.foreach(Iterator.scala:893)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
        at scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:157)
        at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1336)
        at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2.apply(QueryPlanner.scala:75)
        at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2.apply(QueryPlanner.scala:67)
        at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
        at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
        at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:93)
        at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2$$anonfun$apply$2.apply(QueryPlanner.scala:78)
        at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2$$anonfun$apply$2.apply(QueryPlanner.scala:75)
        at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
        at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
        at scala.collection.Iterator$class.foreach(Iterator.scala:893)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
        at scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:157)
        at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1336)
        at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2.apply(QueryPlanner.scala:75)
        at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2.apply(QueryPlanner.scala:67)
        at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
        at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
        at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:93)
        at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2$$anonfun$apply$2.apply(QueryPlanner.scala:78)
        at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2$$anonfun$apply$2.apply(QueryPlanner.scala:75)
        at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
        at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
        at scala.collection.Iterator$class.foreach(Iterator.scala:893)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
        at scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:157)
        at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1336)
        at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2.apply(QueryPlanner.scala:75)
        at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2.apply(QueryPlanner.scala:67)
        at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
        at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
        at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:93)
        at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2$$anonfun$apply$2.apply(QueryPlanner.scala:78)
        at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2$$anonfun$apply$2.apply(QueryPlanner.scala:75)
        at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
        at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
        at scala.collection.Iterator$class.foreach(Iterator.scala:893)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
        at scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:157)
        at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1336)
        at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2.apply(QueryPlanner.scala:75)
        at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2.apply(QueryPlanner.scala:67)
        at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
        at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
        at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:93)
        at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2$$anonfun$apply$2.apply(QueryPlanner.scala:78)
        at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2$$anonfun$apply$2.apply(QueryPlanner.scala:75)
        at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
        at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
        at scala.collection.Iterator$class.foreach(Iterator.scala:893)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
        at scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:157)
        at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1336)
        at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2.apply(QueryPlanner.scala:75)
        at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2.apply(QueryPlanner.scala:67)
        at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
        at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
        at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:93)
        at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2$$anonfun$apply$2.apply(QueryPlanner.scala:78)
        at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2$$anonfun$apply$2.apply(QueryPlanner.scala:75)
        at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
        at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
        at scala.collection.Iterator$class.foreach(Iterator.scala:893)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
        at scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:157)
        at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1336)
        at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2.apply(QueryPlanner.scala:75)
        at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2.apply(QueryPlanner.scala:67)
        at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
        at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
        at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:93)
        at org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:72)
        at org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:68)
        at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:77)
        at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:77)
        at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3249)
        at org.apache.spark.sql.Dataset.count(Dataset.scala:2769)
        at com.custom.Logic.MyClass$.apply(MyClass.scala:28)
        at org.specs2.execute.ResultExecution$class.execute(ResultExecution.scala:23)
        at org.specs2.execute.ResultExecution$.execute(ResultExecution.scala:121)
        at org.specs2.execute.Result$$anon$11.asResult(Result.scala:246)
        at org.specs2.execute.AsResult$.apply(AsResult.scala:32)
        at org.specs2.specification.core.AsExecution$$anon$1$$anonfun$execute$1.apply(AsExecution.scala:15)
        at org.specs2.specification.core.AsExecution$$anon$1$$anonfun$execute$1.apply(AsExecution.scala:15)
        at org.specs2.execute.ResultExecution$class.execute(ResultExecution.scala:23)
        at org.specs2.execute.ResultExecution$.execute(ResultExecution.scala:121)
        at org.specs2.execute.Result$$anon$11.asResult(Result.scala:246)
        at org.specs2.execute.AsResult$.apply(AsResult.scala:32)
        at org.specs2.execute.AsResult$$anonfun$safely$1.apply(AsResult.scala:40)
        at org.specs2.execute.AsResult$$anonfun$safely$1.apply(AsResult.scala:40)
        at org.specs2.execute.ResultExecution$class.execute(ResultExecution.scala:23)
        at org.specs2.execute.ResultExecution$.execute(ResultExecution.scala:121)
        at org.specs2.execute.AsResult$.safely(AsResult.scala:40)
        at org.specs2.specification.core.Execution$$anonfun$result$1.apply(Execution.scala:305)
        at org.specs2.specification.core.Execution$$anonfun$result$1.apply(Execution.scala:305)
        at org.specs2.specification.core.Execution$$anonfun$withEnvSync$1$$anonfun$apply$20$$anonfun$apply$21.apply(Execution.scala:323)
        at org.specs2.execute.ResultExecution$class.execute(ResultExecution.scala:23)
        at org.specs2.execute.ResultExecution$.execute(ResultExecution.scala:121)
        at org.specs2.execute.Result$$anon$11.asResult(Result.scala:246)
        at org.specs2.execute.AsResult$.apply(AsResult.scala:32)
        at org.specs2.execute.AsResult$$anonfun$safely$1.apply(AsResult.scala:40)
        at org.specs2.execute.AsResult$$anonfun$safely$1.apply(AsResult.scala:40)
        at org.specs2.execute.ResultExecution$class.execute(ResultExecution.scala:23)
        at org.specs2.execute.ResultExecution$.execute(ResultExecution.scala:121)
        at org.specs2.execute.AsResult$.safely(AsResult.scala:40)
        at org.specs2.specification.core.Execution$$anonfun$withEnvSync$1$$anonfun$apply$20.apply(Execution.scala:323)
        at org.specs2.specification.core.Execution$$anonfun$withEnvSync$1$$anonfun$apply$20.apply(Execution.scala:323)
        at org.specs2.specification.core.Execution$$anonfun$2$$anonfun$apply$5.apply(Execution.scala:135)
        at org.specs2.specification.core.Execution$$anonfun$2$$anonfun$apply$5.apply(Execution.scala:135)
        at scala.util.Success$$anonfun$map$1.apply(Try.scala:237)
        at scala.util.Try$.apply(Try.scala:192)
        at scala.util.Success.map(Try.scala:237)
        at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:237)
        at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:237)
        at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: com.fasterxml.jackson.databind.JsonMappingException: Incompatible Jackson version: 2.9.5
        at com.fasterxml.jackson.module.scala.JacksonModule$class.setupModule(JacksonModule.scala:64)
        at com.fasterxml.jackson.module.scala.DefaultScalaModule.setupModule(DefaultScalaModule.scala:19)
        at com.fasterxml.jackson.databind.ObjectMapper.registerModule(ObjectMapper.java:751)
        at org.apache.spark.rdd.RDDOperationScope$.<init>(RDDOperationScope.scala:82)
        at org.apache.spark.rdd.RDDOperationScope$.<clinit>(RDDOperationScope.scala)`enter code here`
        ... 157 more

1 个答案:

答案 0 :(得分:1)

我找到了解决方案,但我想知道是否有人用其他方法来解决这个问题。

build.sbt中,我将jackson-databind的版本强制为“ 2.6.7.2”:

dependencyOverrides ++= Seq(
  "com.fasterxml.jackson.core" % "jackson-databind" % "2.6.7.2"
)