为什么我的子查询甚至针对过滤行执行?

时间:2011-11-22 13:47:30

标签: sql postgresql

我的查询看起来有点像这样(注意:实际查询是由Hibernate生成的,有点复杂):

select * from outage_revisions orev
join outages o
    on orev.outage=o.id
    where o.observed_end is null
    and orev.observation_date =
        (select max(observation_date)
            from outage_revisions orev2
            where orev2.observation_date <= '2011-11-21 00:00:00'
            and orev2.outage = orev.outage);

此查询运行速度非常慢(约15分钟)。但是,如果我用子查询取出where子句的一部分,它几乎立即返回(大约83毫秒),只有大约14行。

此外,子查询本身非常快(约31毫秒):

select max(observation_date) from outage_revisions orev2
where orev2.observation_date <= '2011-11-21 00:00:00'
and orev2.outage = 1

我的问题是:如果除了子查询过滤器之外只有完整查询返回的14行,为什么添加子查询会使查询变慢?子查询不应该最多添加大约31 * 14毫秒吗?

以下是完整查询的计划:

Nested Loop  (cost=0.00..71078813.16 rows=1 width=115)
   ->  Seq Scan on outagerevisions orev  (cost=0.00..71077624.67 rows=284 width=79)
         Filter: (observationdate = (SubPlan 2))
         SubPlan 2
           ->  Result  (cost=1250.56..1250.57 rows=1 width=0)
                 InitPlan 1 (returns $1)
                   ->  Limit  (cost=0.00..1250.56 rows=1 width=8)
                         ->  Index Scan Backward using idx_observationdate on outagerevisions orev2  (cost=0.00..2501.12 rows=2 width=8)
                               Index Cond: (observationdate <= '2011-11-21 00:00:00'::timestamp without time zone)
                               Filter: ((observationdate IS NOT NULL) AND (outage = $0))
   ->  Index Scan using outages_pkey on outages o  (cost=0.00..4.17 rows=1 width=36)
         Index Cond: (o.id = orev.outage)
         Filter: (o.observedend IS NULL)

1 个答案:

答案 0 :(得分:3)

我的猜测是,PostgreSQL只是在执行查询方面做出了糟糕的选择。虽然在执行相关子查询之前它似乎应该缩小到9行,但它可能不会这样做,因此子查询必须运行60,000次。虽然它正在这样做,但它还必须跟踪哪些行将继续进行下一步,等等。

以下是您可以尝试编写的其他几种方法:

SELECT
    <column list>
FROM
    Outage_Revisions OREV
JOIN Outages O ON
    OREV.outage = O.id
LEFT OUTER JOIN Outage_Revisions OREV2 ON
    OREV2.outage = OREV.outage AND
    OREV2.observation_date <= '2011-11-21 00:00:00' AND
    OREV2.observation_date > OREV.observation_date
WHERE
    O.observed_end IS NULL AND
    OREV2.outage IS NULL

或 (假设PostgreSQL和Hibernate支持加入子查询)

SELECT
    <column list>
FROM
    Outage_Revisions OREV
JOIN Outages O ON
    OREV.outage = O.id
JOIN (SELECT OREV2.outage, MAX(OREV2.observation_date) AS max_observation_date
      FROM Outage_Revisions OREV2
      WHERE OREV2.observation_date <= '2011-11-21 00:00:00'
      GROUP BY OREV2.outage) SQ ON
    SQ.outage = OREV.outage AND
    SQ.max_observation_date = OREV.observation_date
WHERE
    O.observed_end IS NULL

您可以使用最后一个查询中的联接顺序。