order_table
和room_table
order_table
+----------+---------+
| order_id | info |
+----------+---------+
| order1 | infos |
+----------+---------+
room_table有很多列
+----------+---------+-----+
| order_id | room_id | ... |
+----------+---------+-----+
| order1 | room1 | ... |
| order1 | room2 | ... |
+----------+---------+-----+
我想将select * from room_table group by order_id
结果作为收集列表添加到order_table
新列rooms
中。
输出表应保留以下架构:
-order_id string,
-info string,
-room array<struct>
--room_id string,
--room_price int,
--room_name string
-- ....
答案 0 :(得分:2)
val df1 = Seq(("order_1", "order_1_info"),
("order_2", "order_2_info")).toDF("order_id", "info")
val df2 = Seq(("order_1", "room_1", 100, "palace_1"),
("order_2", "room_2", 200, "palace_2"),
("order_1", "room_3", 100, "palace_3"),
("order_2", "room_8", 200, "palace_x"))
.toDF("order_id", "room_id", "room_price", "room_name")
val cols: Array[String] = df2.columns
val df3 = df2.groupBy("order_id").agg(collect_list(struct(cols.head, cols.tail:_*)) as "room")
val df4 = df1.join(df3, Seq("order_id"))
df4.show()
df4.printSchema()
在上面的代码片段中,我仅制作了一些示例数据框供使用。
输出:-
+--------+------------+--------------------+
|order_id| info| room|
+--------+------------+--------------------+
| order_1|order_1_info|[[order_1,room_1,...|
| order_2|order_2_info|[[order_2,room_2,...|
+--------+------------+--------------------+
模式:-
root
|-- order_id: string (nullable = true)
|-- info: string (nullable = true)
|-- room: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- order_id: string (nullable = true)
| | |-- room_id: string (nullable = true)
| | |-- room_price: integer (nullable = false)
| | |-- room_name: string (nullable = true)
我希望这会有所帮助