对于数据框:
id,col
63975914,acacia
63975911,better
65475384,acacia
65475385,excelsa
我想旋转数据框,使其看起来像这样:
col, value
acacia, 63975914,65475384
better, 63975911
excelsa, 65475385
如何使用PySpark做到这一点?
答案 0 :(得分:0)
使用collect_set()
和concat_ws()
:
from pyspark.sql import functions as F
df.groupBy("col").agg(F.concat_ws(",", F.collect_set("id")).alias('value')).show()
+-------+-----------------+
| col| value|
+-------+-----------------+
| acacia|63975914,65475384|
| better| 63975911|
|excelsa| 65475385|
+-------+-----------------+