当我尝试使用Scala中的cosmosdb连接器从SparkSql数据集读取时,出现异常。我在本地模式下运行Spark。
我的sbt依赖项如下:
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-sql" % "2.3.1" % Provided withSources() withJavadoc(),
"com.microsoft.azure" % "azure-cosmosdb-spark_2.3.0_2.11" % "1.3.3" % Provided withSources() withJavadoc(),
"org.specs2" %% "specs2-core" % "4.3.4" % Test withSources() withJavadoc()
)
我正在运行的代码如下:
import com.microsoft.azure.cosmosdb.spark.schema._
import com.microsoft.azure.cosmosdb.spark.config.Config
import org.apache.spark.sql.catalyst.ScalaReflection
import org.apache.spark.sql.types.StructType
import org.apache.spark.sql.{Dataset, SparkSession}
import org.specs2.mutable.Specification
val spark: SparkSession = SparkSession.builder.master("local").getOrCreate()
val schema: StructType = ScalaReflection.schemaFor[MyCaseClass].dataType.asInstanceOf[StructType]
val configMap: Config = Config(Map(
"Endpoint" -> "https://mydocdb.documents.azure.com:443/",
"Masterkey" -> "supersecretvalue==",
"Database" -> "mydatabase",
"Collection" -> "mycollection"))
import spark.implicits._
val data: Dataset[MyCaseClass] = spark.read.schema(schema).cosmosDB(configMap).as[MyCaseClass]
println(data.count()) // Boom!
我得到的例外是:
Caused by: com.fasterxml.jackson.databind.JsonMappingException: Incompatible Jackson version: 2.9.5
com.fasterxml.jackson.databind
正在对某些版本进行运行时检查,但不喜欢。
我打印了我的库的依赖关系树并得到了:
[info] com.custom.library:mylib_2.11:na [S]
[info] +-com.microsoft.azure:azure-cosmosdb-spark_2.3.0_2.11:1.3.3 [S]
[info] | +-com.fasterxml.jackson.core:jackson-databind:2.9.5
[info] | | +-com.fasterxml.jackson.core:jackson-annotations:2.9.0
[info] | | +-com.fasterxml.jackson.core:jackson-core:2.9.5
...
[info] +-org.apache.spark:spark-sql_2.11:2.3.1
[info] +-com.fasterxml.jackson.core:jackson-databind:2.6.7.1 (evicted by: 2.9.5)
[info] | +-com.fasterxml.jackson.core:jackson-annotations:2.6.0 (evicted by: 2.9.0)
[info] | +-com.fasterxml.jackson.core:jackson-annotations:2.6.7 (evicted by: 2.9.0)
[info] | +-com.fasterxml.jackson.core:jackson-annotations:2.9.0
[info] | +-com.fasterxml.jackson.core:jackson-core:2.6.7 (evicted by: 2.9.5)
[info] | +-com.fasterxml.jackson.core:jackson-core:2.7.9 (evicted by: 2.9.5)
[info] | +-com.fasterxml.jackson.core:jackson-core:2.9.5
java.lang.ExceptionInInitializerError
at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
at org.apache.spark.rdd.RDD.map(RDD.scala:370)
at com.microsoft.azure.cosmosdb.spark.schema.CosmosDBRowConverter$.asRow(CosmosDBRowConverter.scala:69)
at com.microsoft.azure.cosmosdb.spark.schema.CosmosDBRelation.buildScan(CosmosDBRelation.scala:78)
at org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$10.apply(DataSourceStrategy.scala:293)
at org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$10.apply(DataSourceStrategy.scala:293)
at org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$pruneFilterProject$1.apply(DataSourceStrategy.scala:338)
at org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$pruneFilterProject$1.apply(DataSourceStrategy.scala:337)
at org.apache.spark.sql.execution.datasources.DataSourceStrategy.pruneFilterProjectRaw(DataSourceStrategy.scala:393)
at org.apache.spark.sql.execution.datasources.DataSourceStrategy.pruneFilterProject(DataSourceStrategy.scala:333)
at org.apache.spark.sql.execution.datasources.DataSourceStrategy.apply(DataSourceStrategy.scala:289)
at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:63)
at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:63)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)
at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:93)
at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2$$anonfun$apply$2.apply(QueryPlanner.scala:78)
at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2$$anonfun$apply$2.apply(QueryPlanner.scala:75)
at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:157)
at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1336)
at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2.apply(QueryPlanner.scala:75)
at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2.apply(QueryPlanner.scala:67)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:93)
at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2$$anonfun$apply$2.apply(QueryPlanner.scala:78)
at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2$$anonfun$apply$2.apply(QueryPlanner.scala:75)
at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:157)
at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1336)
at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2.apply(QueryPlanner.scala:75)
at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2.apply(QueryPlanner.scala:67)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:93)
at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2$$anonfun$apply$2.apply(QueryPlanner.scala:78)
at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2$$anonfun$apply$2.apply(QueryPlanner.scala:75)
at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:157)
at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1336)
at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2.apply(QueryPlanner.scala:75)
at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2.apply(QueryPlanner.scala:67)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:93)
at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2$$anonfun$apply$2.apply(QueryPlanner.scala:78)
at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2$$anonfun$apply$2.apply(QueryPlanner.scala:75)
at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:157)
at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1336)
at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2.apply(QueryPlanner.scala:75)
at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2.apply(QueryPlanner.scala:67)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:93)
at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2$$anonfun$apply$2.apply(QueryPlanner.scala:78)
at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2$$anonfun$apply$2.apply(QueryPlanner.scala:75)
at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:157)
at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1336)
at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2.apply(QueryPlanner.scala:75)
at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2.apply(QueryPlanner.scala:67)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:93)
at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2$$anonfun$apply$2.apply(QueryPlanner.scala:78)
at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2$$anonfun$apply$2.apply(QueryPlanner.scala:75)
at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:157)
at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1336)
at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2.apply(QueryPlanner.scala:75)
at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2.apply(QueryPlanner.scala:67)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:93)
at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2$$anonfun$apply$2.apply(QueryPlanner.scala:78)
at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2$$anonfun$apply$2.apply(QueryPlanner.scala:75)
at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:157)
at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1336)
at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2.apply(QueryPlanner.scala:75)
at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2.apply(QueryPlanner.scala:67)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:93)
at org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:72)
at org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:68)
at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:77)
at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:77)
at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3249)
at org.apache.spark.sql.Dataset.count(Dataset.scala:2769)
at com.custom.Logic.MyClass$.apply(MyClass.scala:28)
at org.specs2.execute.ResultExecution$class.execute(ResultExecution.scala:23)
at org.specs2.execute.ResultExecution$.execute(ResultExecution.scala:121)
at org.specs2.execute.Result$$anon$11.asResult(Result.scala:246)
at org.specs2.execute.AsResult$.apply(AsResult.scala:32)
at org.specs2.specification.core.AsExecution$$anon$1$$anonfun$execute$1.apply(AsExecution.scala:15)
at org.specs2.specification.core.AsExecution$$anon$1$$anonfun$execute$1.apply(AsExecution.scala:15)
at org.specs2.execute.ResultExecution$class.execute(ResultExecution.scala:23)
at org.specs2.execute.ResultExecution$.execute(ResultExecution.scala:121)
at org.specs2.execute.Result$$anon$11.asResult(Result.scala:246)
at org.specs2.execute.AsResult$.apply(AsResult.scala:32)
at org.specs2.execute.AsResult$$anonfun$safely$1.apply(AsResult.scala:40)
at org.specs2.execute.AsResult$$anonfun$safely$1.apply(AsResult.scala:40)
at org.specs2.execute.ResultExecution$class.execute(ResultExecution.scala:23)
at org.specs2.execute.ResultExecution$.execute(ResultExecution.scala:121)
at org.specs2.execute.AsResult$.safely(AsResult.scala:40)
at org.specs2.specification.core.Execution$$anonfun$result$1.apply(Execution.scala:305)
at org.specs2.specification.core.Execution$$anonfun$result$1.apply(Execution.scala:305)
at org.specs2.specification.core.Execution$$anonfun$withEnvSync$1$$anonfun$apply$20$$anonfun$apply$21.apply(Execution.scala:323)
at org.specs2.execute.ResultExecution$class.execute(ResultExecution.scala:23)
at org.specs2.execute.ResultExecution$.execute(ResultExecution.scala:121)
at org.specs2.execute.Result$$anon$11.asResult(Result.scala:246)
at org.specs2.execute.AsResult$.apply(AsResult.scala:32)
at org.specs2.execute.AsResult$$anonfun$safely$1.apply(AsResult.scala:40)
at org.specs2.execute.AsResult$$anonfun$safely$1.apply(AsResult.scala:40)
at org.specs2.execute.ResultExecution$class.execute(ResultExecution.scala:23)
at org.specs2.execute.ResultExecution$.execute(ResultExecution.scala:121)
at org.specs2.execute.AsResult$.safely(AsResult.scala:40)
at org.specs2.specification.core.Execution$$anonfun$withEnvSync$1$$anonfun$apply$20.apply(Execution.scala:323)
at org.specs2.specification.core.Execution$$anonfun$withEnvSync$1$$anonfun$apply$20.apply(Execution.scala:323)
at org.specs2.specification.core.Execution$$anonfun$2$$anonfun$apply$5.apply(Execution.scala:135)
at org.specs2.specification.core.Execution$$anonfun$2$$anonfun$apply$5.apply(Execution.scala:135)
at scala.util.Success$$anonfun$map$1.apply(Try.scala:237)
at scala.util.Try$.apply(Try.scala:192)
at scala.util.Success.map(Try.scala:237)
at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:237)
at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:237)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: com.fasterxml.jackson.databind.JsonMappingException: Incompatible Jackson version: 2.9.5
at com.fasterxml.jackson.module.scala.JacksonModule$class.setupModule(JacksonModule.scala:64)
at com.fasterxml.jackson.module.scala.DefaultScalaModule.setupModule(DefaultScalaModule.scala:19)
at com.fasterxml.jackson.databind.ObjectMapper.registerModule(ObjectMapper.java:751)
at org.apache.spark.rdd.RDDOperationScope$.<init>(RDDOperationScope.scala:82)
at org.apache.spark.rdd.RDDOperationScope$.<clinit>(RDDOperationScope.scala)`enter code here`
... 157 more
答案 0 :(得分:1)
我找到了解决方案,但我想知道是否有人用其他方法来解决这个问题。
在build.sbt
中,我将jackson-databind
的版本强制为“ 2.6.7.2”:
dependencyOverrides ++= Seq(
"com.fasterxml.jackson.core" % "jackson-databind" % "2.6.7.2"
)