我有带时间戳和外键ID的项目。我想通过外键对它们进行分组,按时间戳对每个组进行排序,从每个组中取出前3个,还按照第一个项目的时间戳对所有组进行排序,如下所示:
+----+-------+-------+-------+
| id | item1 | item2 | item3 |
+----+-------+-------+-------+
| A | 1 | 13 | 99 |
| B | 10 | 20 | 21 |
| C | 50 | 51 | 60 |
| D | 56 | 70 | 75 |
+----+-------+-------+-------+
我还希望能够根据第一项的ts选择范围(因此查询ts > 5 AND ts < 55
将排除A和D - 请注意C包含ts = 60的行但我仍然想要包含它,因为该组中的第一个元素具有ts = 50)
我现在的方法是在子查询中找到每个集合中第一个项目的ID,然后为那些id选择topN,这看起来并不理想;我们最终重复两次相同的排序。
SELECT *
FROM (SELECT Row_number()
OVER (
partition BY things.id
ORDER BY links.created_at) AS r2,
links.*
FROM things
INNER JOIN links
ON ( links.b_id = things.id )
WHERE b_id IN (SELECT thing_id
FROM
(SELECT Row_number()
OVER (
partition BY links.b_id
ORDER BY links.created_at) AS
r,
b_id AS
thing_id,
created_at
FROM links
WHERE links.entity_b_type = 'thing'
AND links.user_id =
'1234') tmp
WHERE r = 1
AND created_at < some_time)) tmp
WHERE r2 <= 5;
我可以以某种方式对原始结果进行排序(r <= 3)而不进行第二次选择传递吗?
答案 0 :(得分:1)
假设事物和链接之间的参照完整性,您显示的查询可以简化为:
SELECT *
FROM (
SELECT *, row_number() OVER (PARTITION BY b_id ORDER BY created_at) AS rn
FROM links l
WHERE EXISTS (
SELECT 1
FROM links l1
WHERE l1.b_id = l.bid
AND l1.entity_b_type = 'thing'
AND l1.user_id = '1234' -- why quoted? not integer?
AND l1.created_at < some_time
)
) l
JOIN things t ON t.id = l.b_id
WHERE l.rn <= 5;
根据数据分布,LATERAL
解决方案更快的可能性很好:
SELECT *
FROM things t
, LATERAL (
SELECT *, row_number() OVER (ORDER BY created_at) AS rn -- optional info
FROM links l
WHERE l.b_id = t.id
ORDER BY created_at
LIMIT 5
) l
WHERE EXISTS (
SELECT 1
FROM links l
WHERE l.b_id = t.id
AND l.entity_b_type = 'thing'
AND l.user_id = '1234'
AND l.created_at < some_time
);
详细说明(章节“2a。LATERAL
加入”):
性能的关键是匹配索引。索引总是取决于完整的图片,但这些将使查询非常快:
CREATE INDEX links_idx1 ON links (user_id, entity_b_type, created_at);
CREATE INDEX links_idx2 ON links (b_id, created_at);
首先检查给定谓词links.created_at
的第一个entity_b_type = 'thing' AND user_id = '1234'
是否足够老,然后继续使用每b_id
个最旧的行而不管这些谓词是否可疑。如果这是一个错误,查询可能会进一步简化。
未测试。没有基本信息就很难说更多。