我正在尝试将数据框的所有列转换为单个数组。 结构化流中是否支持某种操作,通过该操作我们可以执行与“爆炸”相反的操作? 任何建议都非常感谢!!!
尝试了collect()和collectAsList()。但是流式播放不支持
+---+---------------+----------------+--------+
|row|ADDRESS_TYPE_CD|DISCONTINUE_DATE|param_cd|
+---+---------------+----------------+--------+
|0 |1 |null |7 |
|2 |6 |null |1 |
+---+---------------+----------------+--------+
我的结果应如下所示:
+---+---------------+----------------+--------+
|row|ADDRESS_TYPE_CD|DISCONTINUE_DATE|param_cd|
+---+---------------+----------------+--------+
[0,2] [1,6] [null,null] [7,2]
+---+---------------+----------------+--------+
答案 0 :(得分:0)
例如,您可以在所有列上使用collect_list
。它将如下:
val aggs = df.columns.map(c => collect_list(col(c)) as c)
df.select(aggs :_*).show()
+------+---------------+----------------+--------+
| row|ADDRESS_TYPE_CD|DISCONTINUE_DATE|param_cd|
+------+---------------+----------------+--------+
|[0, 2]| [1, 6]| [null, null]| [7, 1]|
+------+---------------+----------------+--------+