如何在scala中将列值转换为单个数组?

时间:2019-06-19 07:17:05

标签: scala apache-spark spark-structured-streaming

我正在尝试将数据框的所有列转换为单个数组。 结构化流中是否支持某种操作,通过该操作我们可以执行与“爆炸”相反的操作? 任何建议都非常感谢!!!

尝试了collect()和collectAsList()。但是流式播放不支持

+---+---------------+----------------+--------+
|row|ADDRESS_TYPE_CD|DISCONTINUE_DATE|param_cd|
+---+---------------+----------------+--------+
|0  |1              |null            |7       |
|2  |6              |null            |1       |
+---+---------------+----------------+--------+

我的结果应如下所示:

+---+---------------+----------------+--------+
|row|ADDRESS_TYPE_CD|DISCONTINUE_DATE|param_cd|
+---+---------------+----------------+--------+
[0,2]  [1,6]          [null,null]     [7,2]
+---+---------------+----------------+--------+

1 个答案:

答案 0 :(得分:0)

例如,您可以在所有列上使用collect_list。它将如下:

val aggs = df.columns.map(c => collect_list(col(c)) as c)
df.select(aggs :_*).show()
+------+---------------+----------------+--------+
|   row|ADDRESS_TYPE_CD|DISCONTINUE_DATE|param_cd|
+------+---------------+----------------+--------+
|[0, 2]|         [1, 6]|    [null, null]|  [7, 1]|
+------+---------------+----------------+--------+