是否可以将LIMIT子句分发给子查询?

时间:2011-04-18 19:45:15

标签: postgresql limit

我正在加入此查询的结果:

  SELECT
      twitter_personas.id
    , 'TwitterPersona'
    , twitter_personas.name
  FROM twitter_personas
UNION ALL
  SELECT
      facebook_personas.id
    , 'FacebookPersona'
    , facebook_personas.name
-- and more UNION ALL statements pertaining to my other services

到得分表。 JOIN本身不是问题,但查询计划是“错误的”:PostgreSQL找到前50个分数,然后加入到上面的完整视图,这意味着它做了大量的工作,因为我们只感兴趣但是50是一个变量 - 它可能会发生变化(取决于UI的关注点,并且可能会在某些时候被分页,yada,yada)。

我通过直接在子查询中限制我的结果集来快速地进行查询:

SELECT
    personas.id
  , personas.type
  , personas.name
  , xs.value
FROM (
  SELECT
      twitter_personas.id
    , 'TwitterPersona'
    , twitter_personas.name
  FROM twitter_personas
  WHERE id IN (
    SELECT persona_id
    FROM xs
    ORDER BY
      xs.value DESC
    LIMIT 50)
UNION ALL
  SELECT
      facebook_personas.id
    , 'FacebookPersona'
    , facebook_personas.name
  FROM facebook_personas
  WHERE id IN (
    SELECT persona_id
    FROM xs
    ORDER BY
      xs.value DESC
    LIMIT 50)) AS personas(id, type, name)
  INNER JOIN xs ON xs.persona_id = personas.id
ORDER BY
  xs.value DESC
LIMIT 50

我的问题是如何将外部查询中的50个以上内容分发到内部查询?与原始合并到UNION ALL的完整结果集相比,此查询执行速度非常快(90毫秒),后者在15秒内执行。也许有更好的方法来做到这一点?

以下是我的查询计划供参考。首先,“坏”的,耗时近15秒:

                                                                                QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit  (cost=0.00..31072.27 rows=50 width=176) (actual time=304.299..14403.551 rows=50 loops=1)
  ->  Subquery Scan personas_ranked  (cost=0.00..253116556.67 rows=407303 width=176) (actual time=304.298..14403.511 rows=50 loops=1)
        ->  Nested Loop Left Join  (cost=0.00..253112483.64 rows=407303 width=112) (actual time=304.297..14403.474 rows=50 loops=1)
              ->  Nested Loop  (cost=0.00..252998394.22 rows=407303 width=108) (actual time=304.283..14402.815 rows=50 loops=1)
                    Join Filter: ("*SELECT* 1".id = xs.persona_id)
                    ->  Index Scan Backward using xs_value_index on xs xs  (cost=0.00..459.97 rows=10275 width=12) (actual time=0.013..0.208 rows=50 loops=1)
                    ->  Append  (cost=0.00..15458.35 rows=407303 width=88) (actual time=0.006..244.217 rows=398435 loops=50)
                          ->  Subquery Scan "*SELECT* 1"  (cost=0.00..15420.65 rows=406562 width=88) (actual time=0.006..199.945 rows=398434 loops=50)
                                ->  Seq Scan on twitter_personas  (cost=0.00..11355.02 rows=406562 width=88) (actual time=0.005..134.607 rows=398434 loops=50)
                          ->  Subquery Scan "*SELECT* 2"  (cost=0.00..14.88 rows=150 width=502) (actual time=0.002..0.002 rows=0 loops=49)
                                ->  Seq Scan on email_personas  (cost=0.00..13.38 rows=150 width=502) (actual time=0.001..0.001 rows=0 loops=49)
                          ->  Subquery Scan "*SELECT* 3"  (cost=0.00..21.80 rows=590 width=100) (actual time=0.001..0.001 rows=0 loops=49)
                                ->  Seq Scan on facebook_personas  (cost=0.00..15.90 rows=590 width=100) (actual time=0.001..0.001 rows=0 loops=49)
                          ->  Subquery Scan "*SELECT* 4"  (cost=0.00..1.03 rows=1 width=25) (actual time=0.018..0.019 rows=1 loops=49)
                                ->  Seq Scan on web_personas  (cost=0.00..1.02 rows=1 width=25) (actual time=0.017..0.018 rows=1 loops=49)
              ->  Index Scan using people_personas_pkey on people_personas  (cost=0.00..0.27 rows=1 width=8) (actual time=0.007..0.007 rows=0 loops=50)
                    Index Cond: (people_personas.persona_id = "*SELECT* 1".id)
Total runtime: 14403.711 ms

重写的查询,只需90毫秒:

                                                                                          QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit  (cost=2830.93..2831.05 rows=50 width=108) (actual time=83.914..83.925 rows=50 loops=1)
  ->  Sort  (cost=2830.93..2832.30 rows=551 width=108) (actual time=83.912..83.918 rows=50 loops=1)
        Sort Key: xs.value
        Sort Method:  top-N heapsort  Memory: 28kB
        ->  Hash Join  (cost=875.60..2812.62 rows=551 width=108) (actual time=8.394..79.326 rows=10275 loops=1)
              Hash Cond: ("*SELECT* 1".id = xs.persona_id)
              ->  Append  (cost=588.41..2509.59 rows=551 width=4) (actual time=5.078..69.901 rows=10275 loops=1)
                    ->  Subquery Scan "*SELECT* 1"  (cost=588.41..1184.14 rows=200 width=4) (actual time=5.078..42.428 rows=10274 loops=1)
                          ->  Nested Loop  (cost=588.41..1182.14 rows=200 width=4) (actual time=5.078..40.220 rows=10274 loops=1)
                                ->  HashAggregate  (cost=588.41..590.41 rows=200 width=4) (actual time=5.066..7.900 rows=10275 loops=1)
                                      ->  Index Scan Backward using xs_value_index on xs xs  (cost=0.00..459.97 rows=10275 width=12) (actual time=0.005..2.079 rows=10275 loops=1)
                                ->  Index Scan using twitter_personas_id_index on twitter_personas  (cost=0.00..2.95 rows=1 width=4) (actual time=0.002..0.003 rows=1 loops=10275)
                                      Index Cond: (twitter_personas.id = xs.persona_id)
                    ->  Subquery Scan "*SELECT* 2"  (cost=588.41..649.27 rows=200 width=4) (actual time=13.017..13.017 rows=0 loops=1)
                          ->  Nested Loop  (cost=588.41..647.27 rows=200 width=4) (actual time=13.016..13.016 rows=0 loops=1)
                                ->  HashAggregate  (cost=588.41..590.41 rows=200 width=4) (actual time=5.267..6.909 rows=10275 loops=1)
                                      ->  Index Scan Backward using xs_value_index on xs xs  (cost=0.00..459.97 rows=10275 width=12) (actual time=0.007..2.292 rows=10275 loops=1)
                                ->  Index Scan using facebook_personas_id_index on facebook_personas  (cost=0.00..0.27 rows=1 width=4) (actual time=0.000..0.000 rows=0 loops=10275)
                                      Index Cond: (facebook_personas.id = xs.persona_id)
                    ->  Subquery Scan "*SELECT* 3"  (cost=588.41..648.77 rows=150 width=4) (actual time=12.568..12.568 rows=0 loops=1)
                          ->  Nested Loop  (cost=588.41..647.27 rows=150 width=4) (actual time=12.566..12.566 rows=0 loops=1)
                                ->  HashAggregate  (cost=588.41..590.41 rows=200 width=4) (actual time=5.015..6.538 rows=10275 loops=1)
                                      ->  Index Scan Backward using xs_value_index on xs xs  (cost=0.00..459.97 rows=10275 width=12) (actual time=0.002..2.065 rows=10275 loops=1)
                                ->  Index Scan using email_personas_id_index on email_personas  (cost=0.00..0.27 rows=1 width=4) (actual time=0.000..0.000 rows=0 loops=10275)
                                      Index Cond: (email_personas.id = xs.persona_id)
                    ->  Subquery Scan "*SELECT* 4"  (cost=0.00..27.41 rows=1 width=4) (actual time=0.629..0.630 rows=1 loops=1)
                          ->  Nested Loop Semi Join  (cost=0.00..27.40 rows=1 width=4) (actual time=0.628..0.628 rows=1 loops=1)
                                Join Filter: (web_personas.id = xs.persona_id)
                                ->  Seq Scan on web_personas  (cost=0.00..1.01 rows=1 width=4) (actual time=0.003..0.003 rows=1 loops=1)
                                ->  Index Scan Backward using xs_value_index on xs xs  (cost=0.00..459.97 rows=10275 width=12) (actual time=0.002..0.421 rows=1518 loops=1)
              ->  Hash  (cost=158.75..158.75 rows=10275 width=12) (actual time=3.307..3.307 rows=10275 loops=1)
                    ->  Seq Scan on xs xs  (cost=0.00..158.75 rows=10275 width=12) (actual time=0.006..1.563 rows=10275 loops=1)
Total runtime: 84.066 ms

2 个答案:

答案 0 :(得分:2)

规划器必须将xs中的所有行连接到UNION中的每个表,因为规划人员无法事先知道连接不会影响结果数据集(这可能会影响哪些行在前50名。

你可以用一个临时表做两步吗?

create temporary table top50 as
select xs.persona_id
, xs.value
from xs
order by value desc
limit 50;

select *
from top50
join personas_view on top50.persona_id = personas_view.id;

答案 1 :(得分:2)

这不起作用的原因是ORDER BY xs.value DESC在限制之前被处理,并且为了知道第一个(或最后一个)50个条目,它必须(逻辑上)首先计算所有条目。如果你将限制放在联盟的分支中,那么你只能获得那些已经在他们角色类型前50名中的前50个条目,这可能是不同的。如果您可以接受,则可以像您一样手动重写查询,但数据库系统无法为您执行此操作。