如何使这个SQL查询合理快速

时间:2014-03-21 15:56:33

标签: sql postgresql join left-join query-performance

我想优化(阅读:make make all)一个SQL查询。

以下PostgreSQL查询检索我需要的记录。我(相信我)可以通过在实际数据库的一小部分上运行查询来确认。

SELECT B.*, A1.foo, A1.bar, A2.foo, A2.bar FROM B LEFT JOIN A as A1 on B.n1_id = A1.n_id LEFT JOIN A as A2 on B.n2_id = A2.n_id WHERE B.l_id IN (
    SELECT l_id FROM C 
        WHERE l_id IN (
            SELECT l_id FROM B 
                WHERE n1_id IN (SELECT n_id FROM A WHERE foo BETWEEN foo_min AND foo_max AND bar BETWEEN bar_min AND bar_max)
            UNION
            SELECT l_id FROM B 
                WHERE n2_id IN (SELECT n_id FROM A WHERE foo BETWEEN foo_min AND foo_max AND bar BETWEEN bar_min AND bar_max)
            ) 
            AND (property1 = 'Y' OR property2 = 'Y')
    )

数据库的相关部分如下:

table A:
n_id (PK);
foo, int (indexed);
bar, int (indexed);

table B:
l_id (PK);
n1_id (FK, indexed);
n2_id (FK, (indexed);

table C:
l_id (PK, FK);
property1, char (indexed);
property2, char (indexed);

EXPLAIN告诉我:

"Merge Join  (cost=6590667.27..10067376.97 rows=453419 width=136)"
"  Merge Cond: (A2.n_id = B.n2_id)"
"  ->  Index Scan using pk_A on A A2  (cost=0.57..3220265.29 rows=99883648 width=38)"
"  ->  Sort  (cost=6590613.72..6591747.27 rows=453419 width=98)"
"        Sort Key: B.n2_id"
"        ->  Merge Join  (cost=3071304.25..6548013.91 rows=453419 width=98)"
"              Merge Cond: (A1.n_id = B.n1_id)"
"              ->  Index Scan using pk_A on A A1  (cost=0.57..3220265.29 rows=99883648 width=38)"
"              ->  Sort  (cost=3071250.74..3072384.28 rows=453419 width=60)"
"                    Sort Key: B.n1_id"
"                    ->  Hash Semi Join  (cost=32475.31..3028650.92 rows=453419 width=60)"
"                          Hash Cond: (B.l_id = C.l_id)"
"                          ->  Seq Scan on B B  (cost=0.00..2575104.04 rows=122360504 width=60)"
"                          ->  Hash  (cost=26807.58..26807.58 rows=453419 width=16)"
"                                ->  Nested Loop  (cost=10617.22..26807.58 rows=453419 width=16)"
"                                      ->  HashAggregate  (cost=10616.65..10635.46 rows=1881 width=8)"
"                                            ->  Append  (cost=4081.76..10611.95 rows=1881 width=8)"
"                                                  ->  Nested Loop  (cost=4081.76..5383.92 rows=1078 width=8)"
"                                                        ->  Bitmap Heap Scan on A  (cost=4081.19..4304.85 rows=56 width=8)"
"                                                              Recheck Cond: ((bar >= bar_min) AND (bar <= bar_max) AND (foo >= foo_min) AND (foo <= foo_max))"
"                                                              ->  BitmapAnd  (cost=4081.19..4081.19 rows=56 width=0)"
"                                                                    ->  Bitmap Index Scan on A_bar_idx  (cost=0.00..740.99 rows=35242 width=0)"
"                                                                          Index Cond: ((bar >= bar_min) AND (bar <= bar_max))"
"                                                                    ->  Bitmap Index Scan on A_foo_idx  (cost=0.00..3339.93 rows=159136 width=0)"
"                                                                          Index Cond: ((foo >= foo_min) AND (foo <= foo_max))"
"                                                        ->  Index Scan using nx_B_n1 on B  (cost=0.57..19.08 rows=19 width=16)"
"                                                              Index Cond: (n1_id = A.n_id)"
"                                                  ->  Nested Loop  (cost=4081.76..5209.22 rows=803 width=8)"
"                                                        ->  Bitmap Heap Scan on A A_1  (cost=4081.19..4304.85 rows=56 width=8)"
"                                                              Recheck Cond: ((bar >= bar_min) AND (bar <= bar_max) AND (foo >= foo_min) AND (foo <= foo_max))"
"                                                              ->  BitmapAnd  (cost=4081.19..4081.19 rows=56 width=0)"
"                                                                    ->  Bitmap Index Scan on A_bar_idx  (cost=0.00..740.99 rows=35242 width=0)"
"                                                                          Index Cond: ((bar >= bar_min) AND (bar <= bar_max))"
"                                                                    ->  Bitmap Index Scan on A_foo_idx  (cost=0.00..3339.93 rows=159136 width=0)"
"                                                                          Index Cond: ((foo >= foo_min) AND (foo <= foo_max))"
"                                                        ->  Index Scan using nx_B_n2 on B B_1  (cost=0.57..16.01 rows=14 width=16)"
"                                                              Index Cond: (n2_id = A_1.n_id)"
"                                      ->  Index Scan using pk_C on C  (cost=0.57..8.58 rows=1 width=8)"
"                                            Index Cond: (l_id = B.l_id)"
"                                            Filter: ((property1 = 'Y'::bpchar) OR (property2 = 'Y'::bpchar))"

所有三个表都有数百万行。我无法更改表定义。 WHERE l_id IN ( SELECT l_id FROM B...UNION...)非常严格,并且返回&lt; 100个结果。

如何在合理的时间内(最多几秒)执行查询?

编辑:忘记在最外面的SELECT中选择两列。现在应该改变这个问题了。

更新 这似乎是一个棘手的问题,可能是由于我缺乏信息。我希望提供更多信息,但数据库是专有的和保密的。

我可以使用以下查询以相当快的速度(0.1秒)检索B的行,并使用以下查询:

WITH relevant_a AS (
    SELECT * FROM A 
        WHERE
            foo BETWEEN foo_min AND foo_max 
            AND
            bar BETWEEN bar_min AND bar_max
)
WITH relevant_c AS (
    SELECT * FROM C
        WHERE l_id IN (
            SELECT l_id FROM B
                WHERE n1_id IN (
                    SELECT n_id FROM relevant_a
                )
            UNION
            SELECT l_id FROM B
                WHERE n2_id IN (
                    SELECT n_id FROM relevant_a
                )
        )
        AND
        (property1 = 'Y' OR property2= 'Y')
),
relevant_b AS (
    SELECT * FROM B WHERE l_id IN (
        SELECT l_id FROM relevant_c
    )
)

SELECT * FROM relevant_b

与A的连接是它变慢的部分。查询返回&lt; 100条记录,为什么与A的连接会让它变得如此之慢?您有什么想法如何让这个简单的加入更快?简单地从另一个表中添加四列信息不应该太昂贵。

5 个答案:

答案 0 :(得分:2)

或类似的东西:

SELECT B.*, A1.foo, A2.bar 
FROM B 
     LEFT JOIN A as A1 on B.n1_id = A1.n_id 
     LEFT JOIN A as A2 on B.n2_id = A2.n_id 
     INNER JOIN C on (C.l_id = B.l_id)
where 
     A1.foo between A1.foo_min AND A1.foo_max AND 
     A2.bar BETWEEN A2.bar_min AND A2.bar_max and
     b.foo between b.foo_min AND b.foo_max AND 
     b.bar BETWEEN b.bar_min AND bar_max   AND
     (C.property1 = 'Y' OR C.property2 = 'Y')

答案 1 :(得分:0)

  1. 您可以将 A(A1,A2)<{1}}上的Left joins加入1 Left join,并使用OR加入ON条款
  2. INs更改为Exists
  3. 使用UNION条款
  4. OR更改为1个查询

    试试这个......

        SELECT B.*
        , A1.foo
        , A2.bar
    FROM B
    LEFT JOIN A AS A1
        ON (
                B.n1_id = A1.n_id
                OR B.n2_id = A1.n_id
                )
    WHERE EXISTS (
            SELECT l_id
            FROM C
            WHERE EXISTS (
                    SELECT l_id
                    FROM B
                    WHERE EXISTS (
                            SELECT n_id
                            FROM A
                            WHERE foo BETWEEN foo_min
                                    AND foo_max
                                AND bar BETWEEN bar_min
                                    AND bar_max
                                AND (
                                    A.n_id = B.n_id
                                    OR A.n_id = B.n2_id
                                    )
                            )
                        AND B.l_id = C.l_id
                    )
            )
    

答案 2 :(得分:0)

就我所见,你选择B,其中两个相关的As中至少有一个在给定范围内。此外,你需要有一个C为B.然后你显示两个关联的与f值的foo和bar。

SELECT B.*, A1.foo, A2.bar
FROM B 
LEFT JOIN A A1 ON A1.n_id = B.n1_id
LEFT JOIN A A2 ON A2.n_id = B.n2_id
WHERE 
(
  (A1.foo BETWEEN foo_min AND foo_max AND A1.bar BETWEEN bar_min AND bar_max)
  OR
  (A2.foo BETWEEN foo_min AND foo_max AND A2.bar BETWEEN bar_min AND bar_max)
)
AND EXISTS
(
  SELECT *
  FROM C
  WHERE C.l_id = B.l_id
  AND (property1 = 'Y' OR property2 = 'Y')
);

B.n1_id和B.n2_id可以为NULL吗?然后你需要左外连接。否则你可以用内部连接替换它们。

编辑:哎呀我错过了C标准。我已相应修改了声明。

编辑:对你的评论做出反应,这里有与INNER JOIN和IN子句相同的选择:

SELECT B.*, A1.foo, A2.bar
FROM B 
INNER JOIN A A1 ON A1.n_id = B.n1_id
INNER JOIN A A2 ON A2.n_id = B.n2_id
WHERE 
(
  (A1.foo BETWEEN foo_min AND foo_max AND A1.bar BETWEEN bar_min AND bar_max)
  OR
  (A2.foo BETWEEN foo_min AND foo_max AND A2.bar BETWEEN bar_min AND bar_max)
)
AND B.l_id IN
(
  SELECT l_id
  FROM C
  WHERE property1 = 'Y' OR property2 = 'Y'
);

答案 3 :(得分:0)

我可以使用以下查询:

WITH relevant_a AS (
    SELECT * FROM A 
        WHERE
            foo BETWEEN foo_min AND foo_max 
            AND
            bar BETWEEN bar_min AND bar_max
),
relevant_c AS (
    SELECT * FROM C
        WHERE l_id IN (
            SELECT l_id FROM B
                WHERE n1_id IN (
                    SELECT n_id FROM relevant_a
                )
            UNION
            SELECT l_id FROM B
                WHERE n2_id IN (
                    SELECT n_id FROM relevant_a
                )
        )
        AND
        (property1 = 'Y' OR property2= 'Y')
),
relevant_b AS (
    SELECT * FROM B WHERE l_id IN (
        SELECT l_id FROM relevant_c
    )
),
a1_data AS (
    SELECT A.n_id, A.foo, A.bar
    FROM A
    WHERE A.n_id IN (
        SELECT n1_id FROM relevant_b
    )
)
a2_data AS (
    SELECT A.n_id, A.foo, A.bar
    FROM A
    WHERE A.n_id IN (
        SELECT n2_id FROM relevant_b
    )
)

SELECT relevant_b.*, a1_data.foo,  a1_data.bar,  a2_data.foo,  a2_data.bar 
FROM relevant_b
LEFT JOIN a1_data ON relevant_b.n1_id = a1_data.n_id
LEFT JOIN a2_data ON relevant_b.n2_id = a2_data.n_id

我不喜欢这种解决方案,因为它似乎是强制和冗余的。但是,它可以在&lt; 0.1秒。

我仍然拒绝相信像我这样的SQL noob(很容易回想起来)提出一个声明,强制优化器使用比它自己提出的策略更好的策略。必须有一个更好的解决方案,但我不会再寻找它了。

无论如何,谢谢大家的建议,我在路上定义了一些事情。

答案 4 :(得分:0)

表A中的列foo和bar应该只有一个索引,因此对于两个子句之间的这两个索引都是最佳的(通过你的自我查找将是更好的列顺序) 表B没有针对此查询进行最佳设计,您应该使用相同的键和column_id(交叉表)将值n1_id和n2_id转换为2个不同的行。

以下查询应返回相同的数据,并且性能会有很大提升。

with b_norm(l_id, role, n_id) as (
        select l_id, unnest(Array['1','2']) as role, unnest(Array[n1_id, n2_id]) as n_id
        from b
    )
select *
from (
        select distinct l_id
        from a
            join b_norm using (n_id)
            join c using (l_id)
        where bar between 0 and 10000
            and foo between 10000 and 20000
            and (c.property1 = 'Y' or c.property2 = 'Y')
    ) as driver
    join b using (l_id)
    join (
        select a.n_id as n1_id, foo as foo1, bar as bar1
        from a
    )  as a1 using (n1_id)
    join (
        select a.n_id as n2_id, foo as foo2, bar as bar2
        from a
    ) as a2 using (n2_id);