Spark Avro在文件写入时引发异常:NoSuchMethodError

时间:2020-02-19 01:31:02

标签: scala apache-spark avro spark-avro

任何Avro格式的文件写入尝试都将失败,并出现以下堆栈跟踪。

我们正在使用Spark 2.4.3(用户提供了Hadoop),Scala 2.12,并在运行时使用任一spark-shell加载Avro软件包:

spark-shell --packages org.apache.spark:spark-avro_2.12:2.4.3

或提交火花:

spark-submit --packages org.apache.spark:spark-avro_2.12:2.4.3 ...

spark会话报告成功加载了Avro软件包。

...无论哪种情况,我们尝试将任何数据写入avro格式的那一刻,例如:

df.write.format("avro").save("hdfs:///path/to/outputfile.avro")

或选择以下内容:

df.select("recordidstring").write.format("avro").save("hdfs:///path/to/outputfile.avro")

...产生相同的stacktrace错误(此副本来自spark-shell):

java.lang.NoSuchMethodError: org.apache.avro.Schema.createUnion([Lorg/apache/avro/Schema;)Lorg/apache/avro/Schema;
  at org.apache.spark.sql.avro.SchemaConverters$.toAvroType(SchemaConverters.scala:185)
  at org.apache.spark.sql.avro.SchemaConverters$.$anonfun$toAvroType$1(SchemaConverters.scala:176)
  at scala.collection.Iterator.foreach(Iterator.scala:941)
  at scala.collection.Iterator.foreach$(Iterator.scala:941)
  at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
  at scala.collection.IterableLike.foreach(IterableLike.scala:74)
  at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
  at org.apache.spark.sql.types.StructType.foreach(StructType.scala:99)
  at org.apache.spark.sql.avro.SchemaConverters$.toAvroType(SchemaConverters.scala:174)
  at org.apache.spark.sql.avro.AvroFileFormat.$anonfun$prepareWrite$2(AvroFileFormat.scala:119)
  at scala.Option.getOrElse(Option.scala:138)
  at org.apache.spark.sql.avro.AvroFileFormat.prepareWrite(AvroFileFormat.scala:118)
  at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:103)
  at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:170)
  at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:104)
  at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:102)
  at org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:122)
  at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:131)
  at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:155)
  at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
  at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
  at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
  at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
  at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
  at org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:676)
  at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:78)
  at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
  at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
  at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676)
  at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:290)
  at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)
  at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229)

我们能够编写其他格式(文本分隔,json,ORC,镶木地板)而没有任何麻烦。

我们正在使用HDFS(Hadoop v3.1.2)作为文件存储。

我已经尝试了Avro的不同软件包版本(例如2.11,更低),这会引发相同的错误,或者由于不兼容而无法完全加载。所有Python,Scala(使用shell或spark-submit)和Java(使用spark-submit)都会发生此错误。

apache.org JIRA上似乎有一个Open Issue,但是现在已经有一年了,没有任何解决方法。我碰到了这个问题,但同时也想知道社区是否有解决方案?任何帮助表示赞赏。

3 个答案:

答案 0 :(得分:0)

基于链接错误中的a comment,您应指定至少具有1.8.0版本的avro,如下所示:

spark-submit --packages org.apache.spark:spark-avro_2.12:2.4.3,org.apache.avro:avro:1.9.2 ...

(您可能也想尝试其他顺序。)

答案 1 :(得分:0)

好友,我遇到了与您相同的错误,但是我将Spark版本更新为2.11 2.4.4,问题消失了。

答案 2 :(得分:0)

此问题似乎特定于我们在本地群集上的配置-HDFS的单节点版本(本地在Windows,其他linux等上)使avro可以正常书写。我们将重建问题群集,但我有信心仅在该群集上存在配置错误的问题-解决方案-重建。