SQL - 由GROUP BY

时间:2018-06-05 16:59:26

标签: sql hiveql

在尝试进行查询时,我按照指定的代码提取前1000个帖子(按time_spent,我想出了以下查询,其中1, 2, 3是指定的标签:

SELECT g.tagid, e.post_id, SUM(e.time_spent) AS time
  FROM   post_table e
  JOIN (SELECT g.postid, g.tagid
          FROM   tags_table g
          WHERE  g.tagid IN (1, 2, 3)) g
       ON e.post_id = g.postid
  WHERE dt >= '2018-06-01'
  GROUP BY g.tagid, e.post_id
  ORDER BY time DESC
  LIMIT  1000

但是,在这里使用LIMIT 1000的问题是它限制了整个组并使其成为总共1000个结果,而不是为每个标记1,标记2和标记获得1000个结果3(即总共3000个结果)。

如何修改此查询,使LIMIT仅出现在e.post_id的{​​{1}}组件上?或者,是否有另一种方法可以为GROUP BY子句中指定的每个标记获得1000个结果?

1 个答案:

答案 0 :(得分:1)

使用row_number()

SELECT ge.*
FROM (SELECT g.tagid, e.post_id, SUM(e.time_spent) AS time,
             ROW_NUMBER() OVER (PARTITION BY g.tagid ORDER BY SUM(e.time_spent) ) as seqnum
      FROM post_table e JOIN 
           tags_table g
           ON e.post_id = g.postid
      WHERE  g.tagid IN (1, 2, 3) AND dt >= '2018-06-01'
      GROUP BY g.tagid, e.post_id
     ) ge
WHERE seqnum <= 1000
ORDER BY t.tagid, time DESC