我有一个数据框
|--id:string (nullable = true)
|--ddd:struct (nullable = true)
|-- aaa: string (nullable = true)
|-- bbb: long(nullable = true)
|-- ccc: string (nullable = true)
|-- eee: long(nullable = true)
我有这样的输出
id | ddd
--------------------------
1 | [hi,1,this,2]
2 | [hello,6,good,3]
1 | [hru,2,where,7]
3 | [in,4,you,1]
2 | [how,4,to,3]
我希望预期的o / p为:
id | ddd
--------------------
1 | [hi,1,this,2],[hru,2,where,7]
2 | [hello,6,good,3],[how,4,to,3]
3 | [in,4,you,1]
请帮忙
答案 0 :(得分:6)
你可以collect_list
跟随
import org.apache.spark.sql.functions._
df.groupBy("id").agg(collect_list("ddd").as("ddd"))
collect_set
同样适用
df.groupBy("id").agg(collect_set("ddd").as("ddd"))