Postgres 8.4和9.4中的不同查询计划

时间:2015-07-07 04:24:36

标签: postgresql upgrade sql-execution-plan

这是对我们生产中更复杂情况的简化升华。 可以在https://drive.google.com/file/d/0B2I7_NGvCSVOT3ZNNWhpeFdFbTg/view?usp=sharing找到用于此测试用例的数据和设置。

背景

我有两个非常相似的运行PostgreSQL的专用虚拟机。一个运行PG 8.4,另一个运行PG 9.4,但两者都使用几乎相同的配置。下表列出了其他一些差异。

这个问题分为两部分:

  1. 与9.4相比,为什么PG 8.4为版本1 查询选择更快的查询计划?两个计划的计算成本相似,但PG 9.4的实际花费时间是10倍。
  2. 为什么更改WHERE子句以引用a.id而不是r.a_id如此显着地修改查询计划?
  3. 系统信息

                      | PG 8.4     | PG 9.4
    :---------------- | :--------- |
    OS                | CentOS 5.5 | Ubuntu 14.04
    RAM               | 16GB       | 16GB
    CPUs              | 4 x vCPU   | 4 x vCPU
    VMware VM version | 4          | 8
    Disk Size         | 50GB       | 200GB
    

    系统基准

                                    | PG 8.4   | PG 9.4
    :------------------------------ | :--------|
    dd write (32GB)                 | 38 MB/s  | 277 MB/s
    dd read (32GB)                  | 241 MB/s | 243 MB/s
    bonnie++ 1.03 block write K/sec | 208941   | 248528
    bonnie++ 1.03 block read K/sec  | 172184   | 321814
    bonnie++ seek /sec              | 543.5    | 1559.8
    pgbench (-s 1000, -t 2000) TPS  | 345      | 325
    

    查询

    版本1 ```

    EXPLAIN ANALYZE SELECT DISTINCT
        t.id
    FROM
        a
    INNER JOIN b --USING(a_id)
        ON b.a_id = a.id
    INNER JOIN r -- USING(a_id)
        ON r.a_id = a.id
    INNER JOIN t
        ON t.session_id = '1'
            AND a.inst_id = t.inst_id
            AND b.study_id = t.study_id
            AND r.q_id = t.q_id
    WHERE
        r.a_id IN (1, 2, 3)
        AND (
            r.q_id in ('q1', 'q2', 'q3') OR
            r.q_id in ('q4', 'q5', 'q6') OR
            r.q_id in ('q7', 'q8', 'q9') OR
            r.q_id in ('q10', 'q11', 'q12')
        )
    

    ```

    第2版 ```

    EXPLAIN ANALYZE SELECT DISTINCT
        t.id
    FROM
        a
    INNER JOIN b --USING(a_id)
        ON b.a_id = a.id
    INNER JOIN r -- USING(a_id)
        ON r.a_id = a.id
    INNER JOIN t
        ON t.session_id = '1'
            AND a.inst_id = t.inst_id
            AND b.study_id = t.study_id
            AND r.q_id = t.q_id
    WHERE
        a.id IN (1, 2, 3) -- << THIS IS WHAT CHANGED
        AND (
            r.q_id in ('q1', 'q2', 'q3') OR
            r.q_id in ('q4', 'q5', 'q6') OR
            r.q_id in ('q7', 'q8', 'q9') OR
            r.q_id in ('q10', 'q11', 'q12')
        )
    

    ```

    查询效果

                    | PG 8.4 | PG 9.4 |
     -------------- | ------ | ------ |
     version 1 (ms) | 0.718  | 12.355 |
     version 2 (ms) | 1.799  | 3.288  |
    

    EXPLAIN计划

    PG 8.4,版本1

    "HashAggregate  (cost=63.78..63.79 rows=1 width=4) (actual time=0.603..0.603 rows=1 loops=1)"
    "  ->  Hash Join  (cost=61.02..63.78 rows=1 width=4) (actual time=0.540..0.593 rows=1 loops=1)"
    "        Hash Cond: ((b.a_id = a.id) AND (b.study_id = t.study_id))"
    "        ->  Seq Scan on b  (cost=0.00..2.00 rows=100 width=8) (actual time=0.015..0.041 rows=100 loops=1)"
    "        ->  Hash  (cost=60.99..60.99 rows=2 width=16) (actual time=0.513..0.513 rows=1 loops=1)"
    "              ->  Hash Join  (cost=58.22..60.99 rows=2 width=16) (actual time=0.435..0.511 rows=1 loops=1)"
    "                    Hash Cond: ((a.id = r.a_id) AND ((a.inst_id)::text = (t.inst_id)::text))"
    "                    ->  Seq Scan on a  (cost=0.00..2.00 rows=100 width=6) (actual time=0.005..0.026 rows=100 loops=1)"
    "                    ->  Hash  (cost=58.13..58.13 rows=6 width=44) (actual time=0.418..0.418 rows=3 loops=1)"
    "                          ->  Hash Join  (cost=17.54..58.13 rows=6 width=44) (actual time=0.044..0.416 rows=3 loops=1)"
    "                                Hash Cond: ((r.q_id)::text = (t.q_id)::text)"
    "                                ->  Seq Scan on r  (cost=0.00..40.44 rows=23 width=7) (actual time=0.014..0.368 rows=34 loops=1)"
    "                                      Filter: ((a_id = ANY ('{1,2,3}'::integer[])) AND (((q_id)::text = ANY ('{q1,q2,q3}'::text[])) OR ((q_id)::text = ANY ('{q4,q5,q6}'::text[])) OR ((q_id)::text = ANY ('{q7,q8,q9}'::text[])) OR ((q_id)::text = ANY ('{q10, (...)"
    "                                ->  Hash  (cost=17.50..17.50 rows=3 width=72) (actual time=0.020..0.020 rows=1 loops=1)"
    "                                      ->  Seq Scan on t  (cost=0.00..17.50 rows=3 width=72) (actual time=0.006..0.016 rows=1 loops=1)"
    "                                            Filter: ((session_id)::text = '1'::text)"
    "Total runtime: 0.718 ms"
    

    PG 8.4,版本2

    "HashAggregate  (cost=61.77..61.78 rows=1 width=4) (actual time=1.685..1.686 rows=1 loops=1)"
    "  ->  Hash Join  (cost=22.41..61.77 rows=1 width=4) (actual time=0.243..1.677 rows=1 loops=1)"
    "        Hash Cond: (((a.inst_id)::text = (t.inst_id)::text) AND (b.study_id = t.study_id) AND ((r.q_id)::text = (t.q_id)::text))"
    "        ->  Hash Join  (cost=4.85..43.94 rows=23 width=9) (actual time=0.203..1.626 rows=34 loops=1)"
    "              Hash Cond: (r.a_id = b.a_id)"
    "              ->  Seq Scan on r  (cost=0.00..35.95 rows=776 width=7) (actual time=0.024..1.120 rows=1198 loops=1)"
    "                    Filter: (((q_id)::text = ANY ('{q1,q2,q3}'::text[])) OR ((q_id)::text = ANY ('{q4,q5,q6}'::text[])) OR ((q_id)::text = ANY ('{q7,q8,q9}'::text[])) OR ((q_id)::text = ANY ('{q10,q11,q12}'::text[])))"
    "              ->  Hash  (cost=4.82..4.82 rows=3 width=14) (actual time=0.138..0.138 rows=3 loops=1)"
    "                    ->  Hash Join  (cost=2.41..4.82 rows=3 width=14) (actual time=0.057..0.135 rows=3 loops=1)"
    "                          Hash Cond: (b.a_id = a.id)"
    "                          ->  Seq Scan on b  (cost=0.00..2.00 rows=100 width=8) (actual time=0.006..0.049 rows=100 loops=1)"
    "                          ->  Hash  (cost=2.38..2.38 rows=3 width=6) (actual time=0.040..0.040 rows=3 loops=1)"
    "                                ->  Seq Scan on a  (cost=0.00..2.38 rows=3 width=6) (actual time=0.008..0.035 rows=3 loops=1)"
    "                                      Filter: (id = ANY ('{1,2,3}'::integer[]))"
    "        ->  Hash  (cost=17.50..17.50 rows=3 width=72) (actual time=0.020..0.020 rows=1 loops=1)"
    "              ->  Seq Scan on t  (cost=0.00..17.50 rows=3 width=72) (actual time=0.008..0.016 rows=1 loops=1)"
    "                    Filter: ((session_id)::text = '1'::text)"
    "Total runtime: 1.799 ms"
    

    PG 9.4,版本1

    "HashAggregate  (cost=63.54..63.55 rows=1 width=4) (actual time=11.393..11.394 rows=1 loops=1)"
    "  Group Key: t.id"
    "  ->  Nested Loop  (cost=19.96..63.54 rows=1 width=4) (actual time=0.223..11.387 rows=1 loops=1)"
    "        Join Filter: ((b.a_id = r.a_id) AND ((t.q_id)::text = (r.q_id)::text))"
    "        Rows Removed by Join Filter: 1155"
    "        ->  Hash Join  (cost=19.96..22.72 rows=1 width=44) (actual time=0.202..0.294 rows=34 loops=1)"
    "              Hash Cond: ((b.a_id = a.id) AND (b.study_id = t.study_id))"
    "              ->  Seq Scan on b  (cost=0.00..2.00 rows=100 width=8) (actual time=0.016..0.030 rows=100 loops=1)"
    "              ->  Hash  (cost=19.93..19.93 rows=2 width=44) (actual time=0.174..0.174 rows=34 loops=1)"
    "                    Buckets: 1024  Batches: 1  Memory Usage: 2kB"
    "                    ->  Hash Join  (cost=17.54..19.93 rows=2 width=44) (actual time=0.079..0.155 rows=34 loops=1)"
    "                          Hash Cond: ((a.inst_id)::text = (t.inst_id)::text)"
    "                          ->  Seq Scan on a  (cost=0.00..2.00 rows=100 width=6) (actual time=0.007..0.026 rows=100 loops=1)"
    "                          ->  Hash  (cost=17.50..17.50 rows=3 width=72) (actual time=0.025..0.025 rows=1 loops=1)"
    "                                Buckets: 1024  Batches: 1  Memory Usage: 1kB"
    "                                ->  Seq Scan on t  (cost=0.00..17.50 rows=3 width=72) (actual time=0.012..0.021 rows=1 loops=1)"
    "                                      Filter: ((session_id)::text = '1'::text)"
    "                                      Rows Removed by Filter: 35"
    "        ->  Seq Scan on r  (cost=0.00..40.44 rows=25 width=7) (actual time=0.008..0.314 rows=34 loops=34)"
    "              Filter: ((a_id = ANY ('{1,2,3}'::integer[])) AND (((q_id)::text = ANY ('{q1,q2,q3}'::text[])) OR ((q_id)::text = ANY ('{q4,q5,q6}'::text[])) OR ((q_id)::text = ANY ('{q7,q8,q9}'::text[])) OR ((q_id)::text = ANY ('{q10,q11,q12}'::text[]))))"
    "              Rows Removed by Filter: 1164"
    "Planning time: 0.856 ms"
    "Execution time: 11.499 ms"
    

    PG 9.4,版本2

    "HashAggregate  (cost=62.23..62.24 rows=1 width=4) (actual time=2.197..2.197 rows=1 loops=1)"
    "  Group Key: t.id"
    "  ->  Nested Loop  (cost=19.95..62.22 rows=1 width=4) (actual time=0.193..2.189 rows=1 loops=1)"
    "        Join Filter: ((b.a_id = r.a_id) AND ((a.inst_id)::text = (t.inst_id)::text) AND (b.study_id = t.study_id))"
    "        Rows Removed by Join Filter: 299"
    "        ->  Hash Join  (cost=17.54..56.68 rows=12 width=44) (actual time=0.065..1.761 rows=100 loops=1)"
    "              Hash Cond: ((r.q_id)::text = (t.q_id)::text)"
    "              ->  Seq Scan on r  (cost=0.00..35.95 rows=819 width=7) (actual time=0.030..1.271 rows=1198 loops=1)"
    "                    Filter: (((q_id)::text = ANY ('{q1,q2,q3}'::text[])) OR ((q_id)::text = ANY ('{q4,q5,q6}'::text[])) OR ((q_id)::text = ANY ('{q7,q8,q9}'::text[])) OR ((q_id)::text = ANY ('{q10,q11,q12}'::text[])))"
    "              ->  Hash  (cost=17.50..17.50 rows=3 width=72) (actual time=0.022..0.022 rows=1 loops=1)"
    "                    Buckets: 1024  Batches: 1  Memory Usage: 1kB"
    "                    ->  Seq Scan on t  (cost=0.00..17.50 rows=3 width=72) (actual time=0.008..0.018 rows=1 loops=1)"
    "                          Filter: ((session_id)::text = '1'::text)"
    "                          Rows Removed by Filter: 35"
    "        ->  Materialize  (cost=2.41..4.83 rows=3 width=14) (actual time=0.001..0.003 rows=3 loops=100)"
    "              ->  Hash Join  (cost=2.41..4.82 rows=3 width=14) (actual time=0.119..0.172 rows=3 loops=1)"
    "                    Hash Cond: (b.a_id = a.id)"
    "                    ->  Seq Scan on b  (cost=0.00..2.00 rows=100 width=8) (actual time=0.007..0.028 rows=100 loops=1)"
    "                    ->  Hash  (cost=2.38..2.38 rows=3 width=6) (actual time=0.064..0.064 rows=3 loops=1)"
    "                          Buckets: 1024  Batches: 1  Memory Usage: 1kB"
    "                          ->  Seq Scan on a  (cost=0.00..2.38 rows=3 width=6) (actual time=0.016..0.058 rows=3 loops=1)"
    "                                Filter: (id = ANY ('{1,2,3}'::integer[]))"
    "                                Rows Removed by Filter: 97"
    "Planning time: 0.979 ms"
    "Execution time: 2.309 ms"
    

    更新

    我想明确表示非常感谢我提供的调优和数据建模建议。但是,这个示例是对系统范围问题的简化,我们希望找到一种方法,在不修改现有架构的情况下将性能恢复到升级到PG9.4之前的状态。希望这是不可能的。

1 个答案:

答案 0 :(得分:1)

恕我直言,下面的查询要简单得多,至少要阅读。

EXPLAIN ANALYZE SELECT DISTINCT t.id
FROM t
INNER JOIN a ON a.inst_id = t.inst_id
INNER JOIN r ON r.a_id = a.id AND r.q_id = t.q_id
INNER JOIN b ON b.a_id = a.id AND b.study_id = t.study_id
WHERE t.session_id = '1'
  AND r.a_id IN (1, 2, 3)
  AND r.q_id IN ('q1', 'q2', 'q3'
                ,'q4', 'q5', 'q6'
                ,'q7', 'q8', 'q9'
                ,'q10', 'q11', 'q12')
    ;
  • 为序列号添加PRIMARY KEY约束将有很大帮助
  • FOREIGN KEY约束添加到引用的JOIN字段(以及引用字段的UNIQUE约束)将有助于实现更多目标
  • 为FK添加支持索引完成工作
  • 即:在运行VACUUM ANALYZE
  • 之后
  • BTW你的数据模型似乎包含一个循环。表{a,t,b}中的{study_id,inst_id}这可能表示冗余(或错过的候选键)
  • 您的新计算机似乎有快速搜索,您可以尝试将random_page_cost降低到大约2.假设effective_cache_size和shared_buffers足够高。 (但是:在调整之前,让你的数据模型成形