Spark无法读取Avro文件格式

时间:2019-08-30 10:12:35

标签: apache-spark apache-spark-sql spark-avro

当我在Apache钻探中查询.avro文件时,我正确地获取了“正文”列值,如下快照所示。但是,如果我在Spark-SQL中执行相同的操作,则Body列值将采用二进制格式。有没有一种方法可以在Spark-SQL中正确读取数据。附上两张图片,Apache演练能够读取其中的一列,而Apache Spark正在读取二进制格式的body列时没有任何问题。任何帮助将不胜感激。

Apache钻取图像。

Apache drill is able to read the Body column values correctly

Spark-SQL映像。

spark-sql is reading the Body column values in a binary format

avroDF.printSchema
root
 |-- SequenceNumber: long (nullable = true)
 |-- Offset: string (nullable = true)
 |-- EnqueuedTimeUtc: string (nullable = true)
 |-- SystemProperties: map (nullable = true)
 |    |-- key: string
 |    |-- value: struct (valueContainsNull = true)
 |    |    |-- member0: long (nullable = true)
 |    |    |-- member1: double (nullable = true)
 |    |    |-- member2: string (nullable = true)
 |    |    |-- member3: binary (nullable = true)
 |-- Properties: map (nullable = true)
 |    |-- key: string
 |    |-- value: struct (valueContainsNull = true)
 |    |    |-- member0: long (nullable = true)
 |    |    |-- member1: double (nullable = true)
 |    |    |-- member2: string (nullable = true)
 |    |    |-- member3: binary (nullable = true)
 |-- Body: binary (nullable = true)

0 个答案:

没有答案