如何使用python / pyspark将嵌套的avro模式转换为扁平化的avro模式

时间:2019-11-19 15:06:03

标签: python dataframe pyspark avro

我有一个kafka消息的嵌套avro模式。我正在尝试使用pyspark将其转换为关系数据帧。所以我想展平架构以在python中获得多个展平数据帧

下面是示例嵌套avro模式:

{
  "name": "user",
  "type": "record"
  "fields": [
    {"name": "first_name", "type": "string" },
    {"name": "last_name", "type": "string" },
    {"name": "present_address", "type": {
        "name": "addressField"
        "type": "record",
        "fields": [
            {"name": "street_name", "type": "string"},
            {"name": "city", "type": "string"}
        ]
    }},
   {"name": "permanent_address",
    "type": {"type": "array", "items": "addressField"}} 
  ],
}

我想将其扩展为具有相应avro模式的多个数据框,如下所示:

{
    "name": "user",
    "type": "record",
    "fields": [
        {"name": "first_name", "type": "string" },
        {"name": "last_name", "type": "string" }
    ]
}

{
    "name": "present_address",
    "type": "record",
    "fields": [
        {"name": "first_name", "type": "string" },
        {"name": "last_name", "type": "string" },
        {"name": "street_name", "type": "string"},
        {"name": "city", "type": "string"}
    ]
}

{
    "name": "permanent_address",
    "type": "record",
    "fields": [
        {"name": "first_name", "type": "string" },
        {"name": "last_name", "type": "string" },
        {"name": "street_name", "type": "string"},
        {"name": "city", "type": "string"}
    ]
}

我试图遍历初始avro模式,并试图将其扁平化。但是有很多条件要处理(可能容易出错)。有内置的python / pyspark模块可用于将它们转换吗?

0 个答案:

没有答案