我正在尝试将具有保存值的行分组到两列中,并根据第三列对结果进行排序/排序。
结果应包含所有其他列。
表格:
isPlaying()
按 c3 上的时间按 c1 和 c2 列进行分组或过滤,输出将为:
with sample as (
select 'A' as c1, 'B' as c2, '22:00' as c3, 'Da' as c4
union all
select 'A' as c1, 'B' as c2, '23:00' as c3, 'Db' as c4
union all
select 'A' as c1, 'B' as c2, '09:00' as c3, 'Dc' as c4
union all
select 'A' as c1, 'C' as c2, '22:00' as c3, 'Dd' as c4
union all
select 'B' as c1, 'C' as c2, '09:00' as c3, 'De' as c4
)
应保留 c4,c5 .. 等所有其他栏目,但不会对小组标准或排名产生任何影响。
相信一个带有 c1 和 c2 分区的窗口功能,并且 c3 的顺序可以正常工作,但不确定它是否可以使用 c3 。对于非常大的表以及需要按更多列分组的最佳方法。
最终输出将是一个UNIQUE行,其中rank为1(顶部)。列应与 sample 表(无排名)完全相同。
row_number() over (partition by c1, c2 order by c3) as rnk
| c1, c2, c3, c4, rnk|
-----------------------
| A | B |09:00| Dc| 1 |
| A | B |22:00| Da| 2 |
| A | B |23:00| Db| 3 |
| A | C |22:00| Dd| 1 |
| B | C |09:00| De| 1 |
可以完成工作,但请保留colum' rnk '。
我想避免在选择中编写所有列,排除 rnk。
Select * from tableX where rnk = 1
*已修改,添加决赛桌
答案 0 :(得分:3)
select inline(array(rec))
from (select struct(*) as rec
,row_number() over
(
partition by c1,c2
order by c3
) as rn
from sample t
) t
where rn = 1
;
+------+------+-------+------+
| col1 | col2 | col3 | col4 |
+------+------+-------+------+
| A | B | 09:00 | Dc |
| A | C | 22:00 | Dd |
| B | C | 09:00 | De |
+------+------+-------+------+
P.S。 请注意,由于使用了struct
,列名称是别名答案 1 :(得分:0)
我想你只想要row_number()
:
select t.*,
row_number() over (partition by c1, c2 order by c3) as rnk
from sample t;
自从我回答这个问题后,这个问题似乎发生了变化 - 这是一件相当粗鲁的事情。如果您想要排名靠前的列,请使用子查询:
select t.*
from (select t.*,
row_number() over (partition by c1, c2 order by c3) as rnk
from sample t
) t
where rnk = 1;
这为数据中的每个c1 / c2组合返回一行。如果您想要关系中的所有行,请使用rank()
代替row_number()
。