我的Spark中有一个DataFrame,看起来像这样:
+------+-----+-----+-----+
| name | id1 | id2 | id3 |
+------+-----+-----+-----+
| Bob | 1 | 10 | 100 |
| Jill | 2 | 20 | 200 |
| Sue | 3 | 30 | 300 |
| Lane | 4 | 40 | 400 |
+------+-----+-----+-----+
我想像这样对explode
id列进行排序:
+------+----------+---------+
| name | id_value | id_type |
+------+----------+---------+
| Bob | 1 | id1 |
| Bob | 10 | id2 |
| Bob | 100 | id3 |
| Jill | 2 | id1 |
| Jill | 20 | id2 |
| Jill | 200 | id3 |
| Sue | 3 | id1 |
| Sue | 30 | id2 |
| Sue | 300 | id3 |
| Lane | 4 | id1 |
| Lane | 40 | id2 |
| Lane | 400 | id3 |
+------+----------+---------+
我正在使用scala api。有什么好方法吗?我已经看过explode
和arrays_zip
函数,但不确定是不是在树正确的树。
答案 0 :(得分:1)
爆炸适用于数组和映射,但对于结构化类型,可以使用union
:
df.columns
.collect {
case column if column.startsWith("id") =>
df.select($"name",
col(column) as "id_value",
lit(column) as "id_type")
}
.reduce(_ union _)
.show()
输出:
+----+--------+-------+
|name|id_value|id_type|
+----+--------+-------+
| Bob| 1| id1|
|Jill| 2| id1|
| Sue| 3| id1|
|Lane| 4| id1|
| Bob| 10| id2|
|Jill| 20| id2|
| Sue| 30| id2|
|Lane| 40| id2|
| Bob| 100| id3|
|Jill| 200| id3|
| Sue| 300| id3|
|Lane| 400| id3|
+----+--------+-------+
答案 1 :(得分:0)
另一个可能不太优雅的解决方案是将explode
与lateral view
结合使用:
select name, id[0] as id_value, id[1] as id_type
from ds
lateral view explode(array(array(id1, 'id1'), array(id2, 'id2'), array(id3, 'id3'))) ids as id
+----+--------+-------+
|name|id_value|id_type|
+----+--------+-------+
| Bob| 1| id1|
| Bob| 10| id2|
| Bob| 100| id3|
|Jill| 2| id1|
|Jill| 20| id2|
|Jill| 200| id3|
| Sue| 3| id1|
| Sue| 30| id2|
| Sue| 300| id3|
|Lane| 4| id1|
|Lane| 40| id2|
|Lane| 400| id3|
+----+--------+-------+