提供以下数据框,如何将行中的数据转换为列。 未定义属性名称列表。也就是说,可以有比这里定义的更多的属性。我在apache spark中使用scala寻找代码示例
UserCode | PropertyName | PropertyValue
1 | First Name | Ram
1 | Last Name | Shri
1 | Address | Ayodhya
2 | First Name | Laxman
2 | Lastname | Shri
2 | Address | Ayodhya
2 | Skill | Archery
2 | Mariatal Status | Married
2 | Age | 23
3 | First Name | Sita
3 | Last Name | Devi
3 | Address | Ayodhya
预期输出
UserCode | First Name | Last Name | Address | Skill | Age
1 | Ram | Shri | Ayodhya | |
2 | Laxman | Shri | Ayodhya | Archery | 23
3 | Sita | Devi | Ayodhya | |
答案 0 :(得分:0)
如果你可以使用数据透视表,这非常简单。
val df = Seq(
(1, "First Name", "Ram"),
(1, "Last Name", "Shri"),
(1, "Address", "Ayodhya"),
(2, "First Name", "Laxman"),
(2, "Last Name", "Shri"),
(2, "Address", "Ayodhya"),
(2, "Skill", "Archery"),
(2, "Marital Status", "Married"),
(2, "Age", "23"),
(3, "First Name", "Sita"),
(3, "Last Name", "Devi"),
(3, "Address", "Ayodhya")
).toDF("userCode", "propertyName", "propertyValue")
df.groupBy("userCode").pivot("propertyName").agg(first("propertyValue")).show
+--------+-------+----+----------+---------+--------------+-------+
|userCode|Address| Age|First Name|Last Name|Marital Status| Skill|
+--------+-------+----+----------+---------+--------------+-------+
| 1|Ayodhya|null| Ram| Shri| null| null|
| 2|Ayodhya| 23| Laxman| Shri| Married|Archery|
| 3|Ayodhya|null| Sita| Devi| null| null|
+--------+-------+----+----------+---------+--------------+-------+