我尝试使用Spark 1.6.2从avro文件中提取数据,但其中一个字段是BinaryType,我找不到解码方法。
以下是我正在做的事情:
import com.databricks.spark.avro._
val data = sqlContext.read.format("com.databricks.spark.avro").load("plant/topics/1237/raw-data+0+0217766569+0217826453+1237+3805+1478076745829.avro")
data: org.apache.spark.sql.DataFrame = [PLANT: string, MACHINE: string, DATA_TYPE: string, TIMESTAMP: bigint, EMITTER: string, API: string, PAYLOAD: binary, SIZE: bigint, CONTENT_TYPE: string, APP: string, PART: int, PARTS: int, DATA_MODE: string, SYSTEM_VERSION: string, FORMAT_VERSION: string, DATA_COUNT: int]
PAYLOAD是我的二进制列,但即使使用其他方法,我似乎也无法解码它。
val struct = StructType(StructField("PAYLOAD", BinaryType, true) :: Nil)
struct: org.apache.spark.sql.types.StructType = StructType(StructField(PAYLOAD,BinaryType,true))
val data = sqlContext.read.format("com.databricks.spark.avro").schema(struct).load("plant/topics/1237/raw-data+0+0217766569+0217826453+1237+3805+1478076745829.avro")
data: org.apache.spark.sql.DataFrame = [PAYLOAD: binary]
非常感谢帮助!