我对 pyspark 还很陌生
到目前为止我尝试过的代码
ConceptGriddf = spark.sql("""
SELECT DataID
,collect_list(struct(ConceptGridName AS Title,
named_struct("ZoneName",ZoneName
,"Zone", Zone
,"ZoneScore", ZoneScore)as Zones
))ConceptGrid
FROM OutputTable
GROUP BY DataID """)
if dbutils.widgets.get("Debug") == 'yes':
display(ConceptGriddf.limit(10))
桌子
DataID ConceptGridName Zone ZoneName ZoneScore
1 Reserved_89 y Shared 0.58115
1 Reserved_89 x Unshared 0.4939
这是我当前在数据帧中的 JSON 输出
"ConceptGrid": [
{
"Title": "Reserved_89",
"Zones": {
"ZoneName": "Shared",
"Zone": "y",
"ZoneScore": 0.58115
}
},
{
"Title": "Reserved_89",
"Zones": {
"ZoneName": "Unshared",
"Zone": "x",
"ZoneScore": 0.4939
}
}
]
相反,我希望它看起来像这样,但我在这里使用的功能无法实现
"ConceptGrid": [
{
"Title":"Reserved_89",
"Zones":[
{
"ZoneName":"Shared",
"Zone":"y",
"ZoneScore":0.58115
},
{
"ZoneName":"Unshared",
"Zone":"x",
"ZoneScore":0.4939
}
]
}
]
答案 0 :(得分:1)
检查下面的代码。
spark.sql("with data as (select ConceptGridName as Title,collect_list(struct(ZoneName,Zone,ZoneScore)) as zones from OutputTable group by ConceptGridName) select collect_list(struct(Title,zones)) as ConceptGrid from data").toJSON.show(false)
+--------------------------------------------------------------------------------------------------------------------------------------------------------------+
|value |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------+
|{"ConceptGrid":[{"Title":"Reserved_89","zones":[{"ZoneName":"Shared","Zone":"y","ZoneScore":0.58115},{"ZoneName":"Unshared","Zone":"x","ZoneScore":0.4939}]}]}|
+--------------------------------------------------------------------------------------------------------------------------------------------------------------+