使用 pyspark 将数据框列转换为嵌套的 JSON 结构

时间:2021-06-10 00:51:05

标签: json apache-spark pyspark apache-spark-sql

我对 pyspark 还很陌生

到目前为止我尝试过的代码

ConceptGriddf = spark.sql("""
          SELECT  DataID
                 ,collect_list(struct(ConceptGridName AS Title,
                                      named_struct("ZoneName",ZoneName
                                                   ,"Zone", Zone
                                                   ,"ZoneScore", ZoneScore)as Zones
                                                   ))ConceptGrid
          FROM OutputTable 
          GROUP BY DataID """)

if dbutils.widgets.get("Debug") == 'yes':
  display(ConceptGriddf.limit(10))

桌子

DataID ConceptGridName Zone ZoneName ZoneScore
1       Reserved_89    y    Shared    0.58115    
1       Reserved_89    x    Unshared  0.4939 

这是我当前在数据帧中的 JSON 输出

"ConceptGrid": [
    {
        "Title": "Reserved_89",
        "Zones": {
            "ZoneName": "Shared",
            "Zone": "y",
            "ZoneScore": 0.58115
        }
    },
    {
        "Title": "Reserved_89",
        "Zones": {
            "ZoneName": "Unshared",
            "Zone": "x",
            "ZoneScore": 0.4939
        }
    }
  ]

相反,我希望它看起来像这样,但我在这里使用的功能无法实现

"ConceptGrid": [
      {
          "Title":"Reserved_89",
          "Zones":[
              {
              "ZoneName":"Shared",
              "Zone":"y",
              "ZoneScore":0.58115
              },
              {
                "ZoneName":"Unshared",
                "Zone":"x",
                "ZoneScore":0.4939
                }
            
            ]
        }
    ]

1 个答案:

答案 0 :(得分:1)

检查下面的代码。

 spark.sql("with data as (select ConceptGridName as Title,collect_list(struct(ZoneName,Zone,ZoneScore)) as zones from OutputTable group by ConceptGridName) select collect_list(struct(Title,zones)) as ConceptGrid from data").toJSON.show(false)
+--------------------------------------------------------------------------------------------------------------------------------------------------------------+
|value                                                                                                                                                         |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------+
|{"ConceptGrid":[{"Title":"Reserved_89","zones":[{"ZoneName":"Shared","Zone":"y","ZoneScore":0.58115},{"ZoneName":"Unshared","Zone":"x","ZoneScore":0.4939}]}]}|
+--------------------------------------------------------------------------------------------------------------------------------------------------------------+