org.apache.avro.SchemaParseException:未定义的名称

时间:2019-01-03 19:35:38

标签: scala apache-spark kubernetes avro openstack-nova

我的工作是从kafka读取并解码消息。我的Spark作业在MAC上的本地Spark集群(https://archive.apache.org/dist/spark/spark-2.2.1/spark-2.2.1-bin-hadoop2.7.tgz)上运行良好。但是在k8s和openstack上针对spark(2.2.1-hadoop-2.7,来自https://hub.docker.com/r/gettyimages/spark/tags)集群运行作业时,出现以下错误:

Caused by: org.apache.avro.SchemaParseException: Undefined name: "my_items"
    at org.apache.avro.Schema.parse(Schema.java:1052)
    at org.apache.avro.Schema.parse(Schema.java:1164)
    at org.apache.avro.Schema.parse(Schema.java:1094)
    at org.apache.avro.Schema$Parser.parse(Schema.java:927)
    at org.apache.avro.Schema$Parser.parse(Schema.java:917)

我很确定在两种情况下,spark作业试图解析相同的模式。为什么我的k8s火花群集上的火花作业失败?我真的很困惑,现在还不知道如何调试。任何提示欢迎。谢谢

1 个答案:

答案 0 :(得分:0)

Avro的依赖性可能不同。 1.7.41.7.7

容器图片:

$ find / | grep avro
/usr/spark-2.2.1/examples/src/main/python/avro_inputformat.py
/usr/spark-2.2.1/examples/src/main/resources/users.avro
/usr/spark-2.2.1/jars/avro-mapred-1.7.7-hadoop2.jar
/usr/hadoop-2.7.3/share/hadoop/kms/tomcat/webapps/kms/WEB-INF/lib/avro-1.7.4.jar
/usr/hadoop-2.7.3/share/hadoop/httpfs/tomcat/webapps/webhdfs/WEB-INF/lib/avro-1.7.4.jar
/usr/hadoop-2.7.3/share/hadoop/common/lib/avro-1.7.4.jar
/usr/hadoop-2.7.3/share/hadoop/tools/lib/avro-1.7.4.jar
/usr/hadoop-2.7.3/share/hadoop/mapreduce/lib/avro-1.7.4.jar

spark-2.2.1-bin-hadoop2.7.tgz文件:

$ find . | grep avro
./spark-2.2.1-bin-hadoop2.7/examples/src/main/python/avro_inputformat.py
./spark-2.2.1-bin-hadoop2.7/examples/src/main/resources/users.avro
./spark-2.2.1-bin-hadoop2.7/jars/avro-mapred-1.7.7-hadoop2.jar
./spark-2.2.1-bin-hadoop2.7/jars/avro-1.7.7.jar
./spark-2.2.1-bin-hadoop2.7/jars/avro-ipc-1.7.7.jar
# Hadoop Avro?