我有一个带有列的spark df,该列具有Type:Value字段数组。我可以对此进行分解,以使每个type:value对将类型和值分隔为一行,现在想聚合回来,这样我就得到了带有一系列列的单行(对于每个entity_id),其中列名是类型和列值是值。
df.show(5)
+----------------+---------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|entity_id |_tags
+----------------+---------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|5bdb7c3...8a17f9|[Row(type='cond1', value='a=1'),Row(type='cond2', value='a=2'),Row(type='cond3', value='a=3'),Row(type='cond4', value='a=4')] |
爆炸(tags_exploded=df.select(f.col("entity_id"),f.explode(f.col("_tags")))
)后,我得到:
tags_exploded.show(5,False)
(1) Spark Jobs
+----------------+---------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|entity_id |type |value |
+----------------+---------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|5bdb7c3...8a17f9|cond1 |a=1 |
|5bdb7c3...8a17f9|cond2 |a=2
|5bdb7c3...8a17f9|cond3 |a=3 |
|5bdb7c3...8a17f9|cond4 |a=4
我想要的结果是:
+--------------------+---------+---------+------ --+------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|entity_id |cond1 |cond2 |cond3 |cond4
+--------------------+---------+---------+------ --+------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|5bdb7c3...8a17f9|a=1 |a=2 |a=3 |a=4 |
如何聚集爆炸以得到所需的结果-或者从原始数组中提取字段以获取相同的所需结果?首先,我考虑所有最终列都出现在原始df的每一行中的情况(即每个实体都有cond1,cond2,cond3,cond4))