我有一个kafka消息的嵌套avro模式。我正在尝试使用pyspark将其转换为关系数据帧。所以我想展平架构以在python中获得多个展平数据帧
下面是示例嵌套avro模式:
{
"name": "user",
"type": "record"
"fields": [
{"name": "first_name", "type": "string" },
{"name": "last_name", "type": "string" },
{"name": "present_address", "type": {
"name": "addressField"
"type": "record",
"fields": [
{"name": "street_name", "type": "string"},
{"name": "city", "type": "string"}
]
}},
{"name": "permanent_address",
"type": {"type": "array", "items": "addressField"}}
],
}
我想将其扩展为具有相应avro模式的多个数据框,如下所示:
{
"name": "user",
"type": "record",
"fields": [
{"name": "first_name", "type": "string" },
{"name": "last_name", "type": "string" }
]
}
{
"name": "present_address",
"type": "record",
"fields": [
{"name": "first_name", "type": "string" },
{"name": "last_name", "type": "string" },
{"name": "street_name", "type": "string"},
{"name": "city", "type": "string"}
]
}
{
"name": "permanent_address",
"type": "record",
"fields": [
{"name": "first_name", "type": "string" },
{"name": "last_name", "type": "string" },
{"name": "street_name", "type": "string"},
{"name": "city", "type": "string"}
]
}
我试图遍历初始avro模式,并试图将其扁平化。但是有很多条件要处理(可能容易出错)。有内置的python / pyspark模块可用于将它们转换吗?