hive top K sum()按键记录每组

时间:2017-04-10 15:29:57

标签: hive

对于带有列A,B,C的表TBL,我想分组并选择A,B,其中我只取B的前K值(C)

没有上限,这是:

select A, B, sum(C) from TBL group by A, B

带值

A | B | C
--+---+----
a | 1 | 10
a | 2 | 20
a | 1 | 5
a | 3 | 12
b | 3 | 100
b | 2 | 90
b | 1 | 120
c | 5 | 10

和限制为2,结果将是

A | B | sum(C)
--+---+-------
a | 1 | 15
a | 2 | 20
b | 1 | 120
b | 3 | 100
c | 5 | 10

2 个答案:

答案 0 :(得分:1)

+---+---+-------+
| a | b | sum_c |
+---+---+-------+
| a | 2 |    20 |
| a | 1 |    15 |
| b | 1 |   120 |
| b | 3 |   100 |
| c | 5 |    10 |
+---+---+-------+
chmod o+r /home/swagjewelers/public_html/demo/application/views/errors/html/error_php.php

答案 1 :(得分:0)

您可以使用windowing functions来实现此目标。

<强>查询

SELECT a, b, c
FROM (
  SELECT *
    , ROW_NUMBER() OVER (PARTITION BY a ORDER BY c DESC) AS rank
  FROM (
    SELECT A   AS a
      , B      AS b
      , SUM(C) AS c
    FROM db.table
    GROUP BY A, B ) x ) y
WHERE rank < 3

<强>输出

a       b       c
a       2       20
a       1       15
b       1       120
b       3       100
c       5       10