如何根据组中的第一行对行组进行排序和分页?

时间:2015-07-24 17:55:12

标签: sql postgresql sorting window-functions

我有带时间戳和外键ID的项目。我想通过外键对它们进行分组,按时间戳对每个组进行排序,从每个组中取出前3个,按照第一个项目的时间戳对所有组进行排序,如下所示:

+----+-------+-------+-------+
| id | item1 | item2 | item3 |
+----+-------+-------+-------+
| A  |     1 |    13 |    99 |
| B  |    10 |    20 |    21 |
| C  |    50 |    51 |    60 |
| D  |    56 |    70 |    75 |
+----+-------+-------+-------+

我还希望能够根据第一项的ts选择范围(因此查询ts > 5 AND ts < 55将排除A和D - 请注意C包含ts = 60的行但我仍然想要包含它,因为该组中的第一个元素具有ts = 50)

我现在的方法是在子查询中找到每个集合中第一个项目的ID,然后为那些id选择topN,这看起来并不理想;我们最终重复两次相同的排序。

SELECT *
FROM   (SELECT Row_number()
                 OVER (
                   partition BY things.id
                   ORDER BY links.created_at) AS r2,
               links.*
        FROM   things
               INNER JOIN links
                       ON ( links.b_id = things.id )
        WHERE  b_id IN (SELECT thing_id
                               FROM
               (SELECT Row_number()
                         OVER (
                           partition BY links.b_id
                           ORDER BY links.created_at) AS
                       r,
                       b_id                           AS
                       thing_id,
                       created_at
                FROM   links
                WHERE  links.entity_b_type = 'thing'
                       AND links.user_id =
                           '1234') tmp
                               WHERE  r = 1
                                      AND created_at < some_time)) tmp
WHERE  r2 <= 5;

我可以以某种方式对原始结果进行排序(r <= 3)而不进行第二次选择传递吗?

1 个答案:

答案 0 :(得分:1)

假设事物和链接之间的参照完整性,您显示的查询可以简化为:

SELECT *
FROM  (
   SELECT *, row_number() OVER (PARTITION BY b_id ORDER BY created_at) AS rn
   FROM   links l
   WHERE  EXISTS (
      SELECT 1
      FROM   links l1
      WHERE  l1.b_id = l.bid
      AND    l1.entity_b_type = 'thing'
      AND    l1.user_id = '1234'  -- why quoted? not integer?
      AND    l1.created_at < some_time
      )
   ) l
JOIN   things t ON t.id = l.b_id 
WHERE  l.rn <= 5;

根据数据分布,LATERAL解决方案更快的可能性很好:

SELECT *
FROM   things t 
     , LATERAL (
   SELECT *, row_number() OVER (ORDER BY created_at) AS rn  -- optional info
   FROM   links l
   WHERE  l.b_id = t.id
   ORDER  BY created_at
   LIMIT  5
   ) l
WHERE  EXISTS (
   SELECT 1
   FROM   links l
   WHERE  l.b_id = t.id
   AND    l.entity_b_type = 'thing'
   AND    l.user_id = '1234'
   AND    l.created_at < some_time
   );

详细说明(章节“2a。LATERAL加入”):

性能的关键是匹配索引。索引总是取决于完整的图片,但这些将使查询非常快:

CREATE INDEX links_idx1 ON links (user_id, entity_b_type, created_at);
CREATE INDEX links_idx2 ON links (b_id, created_at);

首先检查给定谓词links.created_at的第一个entity_b_type = 'thing' AND user_id = '1234'是否足够老,然后继续使用每b_id个最旧的行而不管这些谓词是否可疑。如果这是一个错误,查询可能会进一步简化。

未测试。没有基本信息就很难说更多。