枢轴火花数据框

时间:2020-10-09 03:02:52

标签: python dataframe pyspark

对于数据框:

id,col 
63975914,acacia
63975911,better 
65475384,acacia 
65475385,excelsa

我想旋转数据框,使其看起来像这样:

col, value
acacia, 63975914,65475384
better, 63975911
excelsa, 65475385

如何使用PySpark做到这一点?

1 个答案:

答案 0 :(得分:0)

使用collect_set()concat_ws()

from pyspark.sql import functions as F

df.groupBy("col").agg(F.concat_ws(",", F.collect_set("id")).alias('value')).show()
+-------+-----------------+
|    col|            value|
+-------+-----------------+
| acacia|63975914,65475384|
| better|         63975911|
|excelsa|         65475385|
+-------+-----------------+