我们说我有一个看起来像这样的蜂巢表:
ID event order_num
------------------------
A red 2
A blue 1
A yellow 3
B yellow 2
B green 1
...
我尝试使用collect_list为每个ID生成事件列表。如下所示:
SELECT ID,
collect_list(event) as events_list,
FROM table
GROUP BY ID;
但是,在我分组的每个ID中,我需要按order_num排序。所以我的结果表看起来像这样:
ID events_list
------------------------
A ["blue","red","yellow"]
B ["green","red"]
我无法在collect_list()查询之前通过ID和order_num进行全局排序,因为该表非常庞大。有没有办法按照collect_list中的order_num排序?
谢谢!
答案 0 :(得分:2)
所以,我找到了answer here。诀窍是使用带有DISTRIBUTE BY和SORT BY语句的子查询。见下文:
pan.delegate = cell
答案 1 :(得分:0)
函数sort_array()
应该对collect_list()
项
select ID, sort_array(collect_list(event)) as events_list,
from table
group by ID;
答案 2 :(得分:0)
尝试以下操作:
WITH tmp AS (
SELECT * FROM data DISTRIBUTE BY ID SORT BY ID, order_num desc
)
SELECT ID, collect_list(event)
FROM tmp
GROUP BY ID