Question

我正在尝试将具有保存值的行分组到两列中，并根据第三列对结果进行排序/排序。

结果应包含所有其他列。

表格：

isPlaying()

按 c3 上的时间按 c1 和 c2 列进行分组或过滤，输出将为：

with sample as (
 select 'A' as c1, 'B' as c2, '22:00' as c3, 'Da' as c4
 union all
select  'A' as c1, 'B' as c2, '23:00' as c3, 'Db' as c4
 union all
select  'A' as c1, 'B' as c2, '09:00' as c3, 'Dc' as c4 
  union all
select  'A' as c1, 'C' as c2, '22:00' as c3, 'Dd' as c4
  union all
select  'B' as c1, 'C' as c2, '09:00' as c3, 'De' as c4
)

应保留 c4，c5 .. 等所有其他栏目，但不会对小组标准或排名产生任何影响。

相信一个带有 c1 和 c2 分区的窗口功能，并且 c3 的顺序可以正常工作，但不确定它是否可以使用 c3 。对于非常大的表以及需要按更多列分组的最佳方法。

最终输出将是一个UNIQUE行，其中rank为1（顶部）。列应与 sample 表（无排名）完全相同。

row_number() over (partition by c1, c2 order by c3) as rnk | c1, c2, c3, c4, rnk| ----------------------- | A | B |09:00| Dc| 1 | | A | B |22:00| Da| 2 | | A | B |23:00| Db| 3 | | A | C |22:00| Dd| 1 | | B | C |09:00| De| 1 |可以完成工作，但请保留colum＆＃39; rnk ＆＃39;。我想避免在选择中编写所有列，排除 rnk。

Select * from tableX where rnk = 1

*已修改，添加决赛桌

Answer 1

select  inline(array(rec))

from   (select  struct(*)   as rec

               ,row_number() over 
                (
                    partition by    c1,c2 
                    order by        c3
                ) as rn

        from    sample t
        ) t

where   rn = 1
;

+------+------+-------+------+
| col1 | col2 | col3  | col4 |
+------+------+-------+------+
| A    | B    | 09:00 | Dc   |
| A    | C    | 22:00 | Dd   |
| B    | C    | 09:00 | De   |
+------+------+-------+------+

P.S。请注意，由于使用了struct

，列名称是别名

Answer 2

我想你只想要row_number()：

select t.*,
       row_number() over (partition by c1, c2 order by c3) as rnk
from sample t;

自从我回答这个问题后，这个问题似乎发生了变化 - 这是一件相当粗鲁的事情。如果您想要排名靠前的列，请使用子查询：

select t.*
from (select t.*,
             row_number() over (partition by c1, c2 order by c3) as rnk
      from sample t
     ) t
where rnk = 1;

这为数据中的每个c1 / c2组合返回一行。如果您想要关系中的所有行，请使用rank()代替row_number()。

Hive - 根据某些列选择唯一的行

2 个答案: