我是 Avro 架构的新手。我根据参考 JSON 创建了以下架构,但我无法为此创建序列化程序。
{
"name": "Name",
"type": "record",
"namespace": "NameSpace",
"fields": [
{
"name": "discussions",
"comment": "discussion ID.",
"type": {
"type": "array",
"items": {
"name": "discussionsRecord",
"comment": "discussion Identifier.",
"type": "record",
"fields": [
{
"name": "discussionId",
"type": "long"
},
{
"name": "channelType",
"comment": "channel Type Identification.",
"type": "int"
},
{
"name": "data",
"comment": "The following block is to capture channel values.",
"type": {
"type": "array",
"items":
[
{
"name": "dataRecord",
"type": "record",
"fields": [
{
"name": "pulse",
"comment": "Pulse.",
"type": "long"
},
{
"name": "communicationName",
"comment": "communication Identification.",
"type": {
"name": "communicationNameEnumType",
"comment": "enum for communication Names.",
"type": "enum",
"symbols": [
"cold", "rainIntensity", "heat"
]
}
},
{
"name": "communicationValue",
"comment": "communication Values.",
"type": "double"
},
{
"name": "classValue",
"comment": "communication class.",
"type": {
"name": "classValueEnumType",
"comment": "enum for Class types.",
"type": "enum",
"symbols": [
"Dark", "Logical"
]
}
}
]
}
]
}
}
]
}
}
}
]
}
答案 0 :(得分:0)
如果你有一个 AVSC 模式,你可以像这样创建一个 SparkSQL 模式(scala)
import org.apache.avro.Schema
import org.apache.spark.sql._
import org.apache.spark.sql.avro.SchemaConverters
val avroSchema : String = ...
val sparkSchema = SchemaConverters.toSqlType(new Schema.Parser().parse(avroSchema))
否则,to_avro()
将现有数据帧及其架构序列化为 Avro 输出