当我在Apache钻探中查询.avro文件时,我正确地获取了“正文”列值,如下快照所示。但是,如果我在Spark-SQL中执行相同的操作,则Body列值将采用二进制格式。有没有一种方法可以在Spark-SQL中正确读取数据。附上两张图片,Apache演练能够读取其中的一列,而Apache Spark正在读取二进制格式的body列时没有任何问题。任何帮助将不胜感激。
Apache钻取图像。
Spark-SQL映像。
avroDF.printSchema
root
|-- SequenceNumber: long (nullable = true)
|-- Offset: string (nullable = true)
|-- EnqueuedTimeUtc: string (nullable = true)
|-- SystemProperties: map (nullable = true)
| |-- key: string
| |-- value: struct (valueContainsNull = true)
| | |-- member0: long (nullable = true)
| | |-- member1: double (nullable = true)
| | |-- member2: string (nullable = true)
| | |-- member3: binary (nullable = true)
|-- Properties: map (nullable = true)
| |-- key: string
| |-- value: struct (valueContainsNull = true)
| | |-- member0: long (nullable = true)
| | |-- member1: double (nullable = true)
| | |-- member2: string (nullable = true)
| | |-- member3: binary (nullable = true)
|-- Body: binary (nullable = true)