我有一个帐户表,单位表和报告表。一个帐户有很多单位(单位的外键是account_id),一个单位有很多报表(报表的外键是unit_id)。我想选择帐户名称,该帐户的总单位数以及上次报告时间:
SELECT accounts.name AS account_name,
COUNT(units.id) AS unit_count,
(SELECT reports.time FROM reports INNER JOIN units ON units.id = reports.unit_id ORDER BY time desc LIMIT 1) AS last_reported_time
FROM accounts
INNER JOIN units ON accounts.id = units.account_id
INNER JOIN reports ON units.id = reports.unit_id
GROUP BY account_name, last_reported_time
ORDER BY unit_count desc;
此查询一直在运行,我不确定它是否按预期执行。
一个帐户有很多单位,一个单位有很多报告。我想显示与每个给定帐户关联的所有单元的最新报告的时间。这个查询是否正确?如果没有,我如何完成我的任务(如果可能,不使用脚本语言)?
EXPLAIN的结果:
Sort (cost=21466114.58..21466547.03 rows=172980 width=38)
Sort Key: (count(public.units.id))
InitPlan 1 (returns $0)
-> Limit (cost=0.00..12.02 rows=1 width=8)
-> Nested Loop (cost=0.00..928988485.04 rows=77309416 width=8)
-> Index Scan Backward using index_reports_on_time on reports (cost=0.00..296291138.34 rows=77309416 width=12)
-> Index Scan using units_pkey on units (cost=0.00..8.17 rows=1 width=4)
Index Cond: (public.units.id = public.reports.unit_id)
-> GroupAggregate (cost=20807359.99..21446321.09 rows=172980 width=38)
-> Sort (cost=20807359.99..20966559.70 rows=63679885 width=38)
Sort Key: accounts.name, public.units.last_reported_time
-> Hash Join (cost=975.50..3846816.82 rows=63679885 width=38)
Hash Cond: (public.reports.unit_id = public.units.id)
-> Seq Scan on reports (cost=0.00..2919132.16 rows=77309416 width=4)
-> Hash (cost=961.43..961.43 rows=1126 width=38)
-> Hash Join (cost=16.37..961.43 rows=1126 width=38)
Hash Cond: (public.units.account_id = accounts.id)
-> Seq Scan on units (cost=0.00..928.67 rows=1367 width=28)
-> Hash (cost=11.72..11.72 rows=372 width=18)
-> Seq Scan on accounts (cost=0.00..11.72 rows=372 width=18)
答案 0 :(得分:0)
查询的大约95%的费用在这里:
-> Sort (cost=20807359.99..20966559.70 rows=63679885 width=38)
Sort Key: accounts.name, public.units.last_reported_time
-> Hash Join (cost=975.50..3846816.82 rows=63679885 width=38)
Hash Cond: (public.reports.unit_id = public.units.id)
-> Seq Scan on reports (cost=0.00..2919132.16 rows=77309416 width=4)
你有reports.unit_id
的索引吗?如果没有,你一定要添加一个。
除此之外,输出列unit_count
似乎给出了每个帐户的单元数,但在所有连接之后计算然后按顺序排序是非常浪费的。选择列表中的子查询对我来说是个神秘的东西;我认为你想要每个单位的最近报告时间,但它只会给你最后一次报告所有单位的总和。试试这个:
SELECT a.account_name, u.unit_count, r.last_reported_time
FROM account a
JOIN (
SELECT account_id, COUNT(*) AS unit_count
FROM units
GROUP BY 1) u ON u.account_id = a.id
LEFT JOIN ( -- allow for units that have not yet submitted a report
SELECT u.account_id, max(r.time) AS last_reported_time
FROM reports r
JOIN units u ON u.id = r.unit_id
GROUP BY 1) r ON r.account_id = a.id
ORDER BY 2 DESC;