简化,我得到以下情况。我有两张桌子。一次迁移通过checks.migration_id
进行多次检查。列checks.old
描述了一种检查。现在,我希望每次迁移时检查old
为真(query1)和false(query2)的最大时间。
大约有30,000次迁移,每次有大约1000次检查,其中old = true,1000次检查old = false。表检查将变得非常极端。检查的顺序没有给出,可能完全混淆。
我希望同时获得最多150次迁移的最新检查。
SQL小提琴:http://sqlfiddle.com/#!15/282ce/15
我使用PostgreSQL 9.3和Rails 3.2(不应该这么重要)
什么是最有效的方法来获取最新的子记录,其中old = true?
表格迁移:
| ID |
|----|
| 1 |
| 2 |
表格检查:
| ID | MIGRATION_ID | OLD | OK | TIME |
|----|--------------|-----|----|----------------------------------|
| 1 | 1 | 1 | 1 | September, 22 2014 12:00:01+0000 |
| 2 | 1 | 0 | 1 | September, 22 2014 12:00:02+0000 |
| 3 | 2 | 1 | 1 | September, 22 2014 12:00:01+0000 |
| 4 | 2 | 0 | 1 | September, 22 2014 12:00:02+0000 |
| 5 | 1 | 1 | 1 | September, 22 2014 12:00:03+0000 |
| 6 | 1 | 0 | 1 | September, 22 2014 12:00:04+0000 |
| 7 | 2 | 1 | 1 | September, 22 2014 12:00:03+0000 |
| 8 | 2 | 0 | 1 | September, 22 2014 12:00:04+0000 |
查询1应返回以下结果:
| Migration.id | Check_ID | OLD | OK | TIME |
|--------------|----------|-----|----|----------------------------------|
| 1 | 5 | 1 | 1 | September, 22 2014 12:00:03+0000 |
| 2 | 7 | 1 | 1 | September, 22 2014 12:00:03+0000 |
查询1应返回以下结果:
| Migration.id | Check_ID | OLD | OK | TIME |
|--------------|----------|-----|----|----------------------------------|
| 1 | 6 | 0 | 1 | September, 22 2014 12:00:04+0000 |
| 2 | 8 | 0 | 1 | September, 22 2014 12:00:04+0000 |
我尝试在子查询中使用max来解决它,但后来我丢失了有关checks.ok
和check.time的信息。
SELECT eq.id, (SELECT max(checks.id) FROM checks WHERE checks.migration_id = eq.id and checks.old = 't') AS latest FROM migrations eq;
SELECT eq.id, (SELECT max(checks.id) FROM checks WHERE checks.migration_id = eq.id and checks.old = 'f') AS latest FROM migrations eq;
(我知道我得到max(id)
而不是max(time)
。)
在Rails中,我尝试为每个迁移获取最新的记录,这导致1 + n问题。我无法包含所有支票,因为有很多支票。
答案 0 :(得分:1)
具有Postgres特定DISTINCT ON
的简单解决方案:
查询1 (“对于每次迁移,检查时time
最大的old
是真的”{/ p>
SELECT DISTINCT ON (migration_id)
migration_id, id AS check_id, old, ok, time
FROM checks
WHERE old
ORDER BY migration_id, time DESC;
反转查询2 的WHERE
条件:
...
WHERE NOT old
...
详细说明:
但是,如果您希望更好地阅读大表的性能,请使用JOIN LATERAL
(Postgres 9.2 +,标准SQL),构建多列索引,如:
CREATE INDEX checks_special_idx ON checks(old, migration_id, time DESC);
查询1 :
SELECT m.id AS migration_id
, c.id AS check_id, c.old, c.ok, c.time
FROM migrations m
-- FROM (SELECT id FROM migrations LIMIT 150) m
JOIN LATERAL (
SELECT id, old, ok, time
FROM checks
WHERE migration_id = m.id
AND old
ORDER BY time DESC
LIMIT 1
) c ON TRUE;
再次在old
上切换条件以进行查询2.
对于未指定的“最多150次迁移”,请使用注释的替代行。
详细说明:
除此之外:不要使用“时间”作为标识符。这是Postgres中的reserved word in standard SQL和基本类型名称。