我有如下输入数据框。
+-------+----------------+------------+
|ID |Title |values |
+-------+----------------+------------+
|ID-1 |First Name |Jolly |
|ID-1 |Middle Name |Jr |
|ID-1 |Last Name |Hudson |
|ID-2 |First Name |Kathy |
|ID-2 |Last Name |Oliver |
|ID-3 |Last Name |Short |
|ID-3 |Middle Name |M |
|ID-4 |First Name |Denver |
+-------+----------------+------------+
我要求输出如下:
+-------+----------------+---------------+--------------+
|ID |First Name |Middle Name | Last Name |
+-------+----------------+---------------+--------------+
|ID-1 |Jolly |Jr | Hudson |
|ID-2 |Kathy |null | Oliver |
|ID-3 |null |M | Short |
|ID-4 |Denver |null | null |
+-------+----------------+---------------+--------------+
请提出可能的解决方案以获取此结果。
预先感谢。
答案 0 :(得分:0)
这是一种通过旋转Title
以使用Values
聚合first
来对数据集进行分组的方法:
val df = Seq(
("ID-1", "First Name", "Jolly"),
("ID-1", "Middle Name", "Jr"),
("ID-1", "Last Name", "Hudson"),
("ID-2", "First Name", "Kathy"),
("ID-2", "Last Name", "Oliver"),
("ID-3", "Last Name", "Short"),
("ID-3", "Middle Name", "M"),
("ID-4", "First Name", "Denver")
).toDF("ID", "Title", "Values")
df.
groupBy("ID").pivot("Title").agg(first($"Values")).
show(false)
// +----+----------+---------+-----------+
// |ID |First Name|Last Name|Middle Name|
// +----+----------+---------+-----------+
// |ID-1|Jolly |Hudson |Jr |
// |ID-3|null |Short |M |
// |ID-4|Denver |null |null |
// |ID-2|Kathy |Oliver |null |
// +----+----------+---------+-----------+