来自连接表

时间:2016-06-20 05:47:35

标签: postgresql

我有一个帐户表,单位表和报告表。一个帐户有很多单位(单位的外键是account_id),一个单位有很多报表(报表的外键是unit_id)。我想选择帐户名称,该帐户的总单位数以及上次报告时间:

SELECT accounts.name AS account_name, 
COUNT(units.id) AS unit_count,  
(SELECT reports.time FROM reports INNER JOIN units ON units.id = reports.unit_id  ORDER BY time desc LIMIT 1) AS last_reported_time
FROM accounts 
INNER JOIN units ON accounts.id = units.account_id 
INNER JOIN reports ON units.id = reports.unit_id
GROUP BY account_name, last_reported_time 
ORDER BY unit_count desc;

此查询一直在运行,我不确定它是否按预期执行。

一个帐户有很多单位,一个单位有很多报告。我想显示与每个给定帐户关联的所有单元的最新报告的时间。这个查询是否正确?如果没有,我如何完成我的任务(如果可能,不使用脚本语言)?

EXPLAIN的结果:

 Sort  (cost=21466114.58..21466547.03 rows=172980 width=38)
   Sort Key: (count(public.units.id))
   InitPlan 1 (returns $0)
     ->  Limit  (cost=0.00..12.02 rows=1 width=8)
           ->  Nested Loop  (cost=0.00..928988485.04 rows=77309416 width=8)
                 ->  Index Scan Backward using index_reports_on_time on reports  (cost=0.00..296291138.34 rows=77309416 width=12)
                 ->  Index Scan using units_pkey on units  (cost=0.00..8.17 rows=1 width=4)
                       Index Cond: (public.units.id = public.reports.unit_id)
   ->  GroupAggregate  (cost=20807359.99..21446321.09 rows=172980 width=38)
         ->  Sort  (cost=20807359.99..20966559.70 rows=63679885 width=38)
               Sort Key: accounts.name, public.units.last_reported_time
               ->  Hash Join  (cost=975.50..3846816.82 rows=63679885 width=38)
                     Hash Cond: (public.reports.unit_id = public.units.id)
                     ->  Seq Scan on reports  (cost=0.00..2919132.16 rows=77309416 width=4)
                     ->  Hash  (cost=961.43..961.43 rows=1126 width=38)
                           ->  Hash Join  (cost=16.37..961.43 rows=1126 width=38)
                                 Hash Cond: (public.units.account_id = accounts.id)
                                 ->  Seq Scan on units  (cost=0.00..928.67 rows=1367 width=28)
                                 ->  Hash  (cost=11.72..11.72 rows=372 width=18)
                                       ->  Seq Scan on accounts  (cost=0.00..11.72 rows=372 width=18)

1 个答案:

答案 0 :(得分:0)

查询的大约95%的费用在这里:

     ->  Sort  (cost=20807359.99..20966559.70 rows=63679885 width=38)
           Sort Key: accounts.name, public.units.last_reported_time
           ->  Hash Join  (cost=975.50..3846816.82 rows=63679885 width=38)
                 Hash Cond: (public.reports.unit_id = public.units.id)
                 ->  Seq Scan on reports  (cost=0.00..2919132.16 rows=77309416 width=4)

你有reports.unit_id的索引吗?如果没有,你一定要添加一个。

除此之外,输出列unit_count似乎给出了每个帐户的单元数,但在所有连接之后计算然后按顺序排序是非常浪费的。选择列表中的子查询对我来说是个神秘的东西;我认为你想要每个单位的最近报告时间,但它只会给你最后一次报告所有单位的总和。试试这个:

SELECT a.account_name, u.unit_count, r.last_reported_time
FROM account a
JOIN (
    SELECT account_id, COUNT(*) AS unit_count
    FROM units
    GROUP BY 1) u ON u.account_id = a.id
LEFT JOIN ( -- allow for units that have not yet submitted a report
    SELECT u.account_id, max(r.time) AS last_reported_time
    FROM reports r
    JOIN units u ON u.id = r.unit_id
    GROUP BY 1) r ON r.account_id = a.id
ORDER BY 2 DESC;