这是对我们生产中更复杂情况的简化升华。 可以在https://drive.google.com/file/d/0B2I7_NGvCSVOT3ZNNWhpeFdFbTg/view?usp=sharing找到用于此测试用例的数据和设置。
我有两个非常相似的运行PostgreSQL的专用虚拟机。一个运行PG 8.4,另一个运行PG 9.4,但两者都使用几乎相同的配置。下表列出了其他一些差异。
这个问题分为两部分:
WHERE
子句以引用a.id而不是r.a_id如此显着地修改查询计划? | PG 8.4 | PG 9.4
:---------------- | :--------- |
OS | CentOS 5.5 | Ubuntu 14.04
RAM | 16GB | 16GB
CPUs | 4 x vCPU | 4 x vCPU
VMware VM version | 4 | 8
Disk Size | 50GB | 200GB
| PG 8.4 | PG 9.4
:------------------------------ | :--------|
dd write (32GB) | 38 MB/s | 277 MB/s
dd read (32GB) | 241 MB/s | 243 MB/s
bonnie++ 1.03 block write K/sec | 208941 | 248528
bonnie++ 1.03 block read K/sec | 172184 | 321814
bonnie++ seek /sec | 543.5 | 1559.8
pgbench (-s 1000, -t 2000) TPS | 345 | 325
版本1 ```
EXPLAIN ANALYZE SELECT DISTINCT
t.id
FROM
a
INNER JOIN b --USING(a_id)
ON b.a_id = a.id
INNER JOIN r -- USING(a_id)
ON r.a_id = a.id
INNER JOIN t
ON t.session_id = '1'
AND a.inst_id = t.inst_id
AND b.study_id = t.study_id
AND r.q_id = t.q_id
WHERE
r.a_id IN (1, 2, 3)
AND (
r.q_id in ('q1', 'q2', 'q3') OR
r.q_id in ('q4', 'q5', 'q6') OR
r.q_id in ('q7', 'q8', 'q9') OR
r.q_id in ('q10', 'q11', 'q12')
)
```
第2版 ```
EXPLAIN ANALYZE SELECT DISTINCT
t.id
FROM
a
INNER JOIN b --USING(a_id)
ON b.a_id = a.id
INNER JOIN r -- USING(a_id)
ON r.a_id = a.id
INNER JOIN t
ON t.session_id = '1'
AND a.inst_id = t.inst_id
AND b.study_id = t.study_id
AND r.q_id = t.q_id
WHERE
a.id IN (1, 2, 3) -- << THIS IS WHAT CHANGED
AND (
r.q_id in ('q1', 'q2', 'q3') OR
r.q_id in ('q4', 'q5', 'q6') OR
r.q_id in ('q7', 'q8', 'q9') OR
r.q_id in ('q10', 'q11', 'q12')
)
```
| PG 8.4 | PG 9.4 |
-------------- | ------ | ------ |
version 1 (ms) | 0.718 | 12.355 |
version 2 (ms) | 1.799 | 3.288 |
PG 8.4,版本1
"HashAggregate (cost=63.78..63.79 rows=1 width=4) (actual time=0.603..0.603 rows=1 loops=1)"
" -> Hash Join (cost=61.02..63.78 rows=1 width=4) (actual time=0.540..0.593 rows=1 loops=1)"
" Hash Cond: ((b.a_id = a.id) AND (b.study_id = t.study_id))"
" -> Seq Scan on b (cost=0.00..2.00 rows=100 width=8) (actual time=0.015..0.041 rows=100 loops=1)"
" -> Hash (cost=60.99..60.99 rows=2 width=16) (actual time=0.513..0.513 rows=1 loops=1)"
" -> Hash Join (cost=58.22..60.99 rows=2 width=16) (actual time=0.435..0.511 rows=1 loops=1)"
" Hash Cond: ((a.id = r.a_id) AND ((a.inst_id)::text = (t.inst_id)::text))"
" -> Seq Scan on a (cost=0.00..2.00 rows=100 width=6) (actual time=0.005..0.026 rows=100 loops=1)"
" -> Hash (cost=58.13..58.13 rows=6 width=44) (actual time=0.418..0.418 rows=3 loops=1)"
" -> Hash Join (cost=17.54..58.13 rows=6 width=44) (actual time=0.044..0.416 rows=3 loops=1)"
" Hash Cond: ((r.q_id)::text = (t.q_id)::text)"
" -> Seq Scan on r (cost=0.00..40.44 rows=23 width=7) (actual time=0.014..0.368 rows=34 loops=1)"
" Filter: ((a_id = ANY ('{1,2,3}'::integer[])) AND (((q_id)::text = ANY ('{q1,q2,q3}'::text[])) OR ((q_id)::text = ANY ('{q4,q5,q6}'::text[])) OR ((q_id)::text = ANY ('{q7,q8,q9}'::text[])) OR ((q_id)::text = ANY ('{q10, (...)"
" -> Hash (cost=17.50..17.50 rows=3 width=72) (actual time=0.020..0.020 rows=1 loops=1)"
" -> Seq Scan on t (cost=0.00..17.50 rows=3 width=72) (actual time=0.006..0.016 rows=1 loops=1)"
" Filter: ((session_id)::text = '1'::text)"
"Total runtime: 0.718 ms"
PG 8.4,版本2
"HashAggregate (cost=61.77..61.78 rows=1 width=4) (actual time=1.685..1.686 rows=1 loops=1)"
" -> Hash Join (cost=22.41..61.77 rows=1 width=4) (actual time=0.243..1.677 rows=1 loops=1)"
" Hash Cond: (((a.inst_id)::text = (t.inst_id)::text) AND (b.study_id = t.study_id) AND ((r.q_id)::text = (t.q_id)::text))"
" -> Hash Join (cost=4.85..43.94 rows=23 width=9) (actual time=0.203..1.626 rows=34 loops=1)"
" Hash Cond: (r.a_id = b.a_id)"
" -> Seq Scan on r (cost=0.00..35.95 rows=776 width=7) (actual time=0.024..1.120 rows=1198 loops=1)"
" Filter: (((q_id)::text = ANY ('{q1,q2,q3}'::text[])) OR ((q_id)::text = ANY ('{q4,q5,q6}'::text[])) OR ((q_id)::text = ANY ('{q7,q8,q9}'::text[])) OR ((q_id)::text = ANY ('{q10,q11,q12}'::text[])))"
" -> Hash (cost=4.82..4.82 rows=3 width=14) (actual time=0.138..0.138 rows=3 loops=1)"
" -> Hash Join (cost=2.41..4.82 rows=3 width=14) (actual time=0.057..0.135 rows=3 loops=1)"
" Hash Cond: (b.a_id = a.id)"
" -> Seq Scan on b (cost=0.00..2.00 rows=100 width=8) (actual time=0.006..0.049 rows=100 loops=1)"
" -> Hash (cost=2.38..2.38 rows=3 width=6) (actual time=0.040..0.040 rows=3 loops=1)"
" -> Seq Scan on a (cost=0.00..2.38 rows=3 width=6) (actual time=0.008..0.035 rows=3 loops=1)"
" Filter: (id = ANY ('{1,2,3}'::integer[]))"
" -> Hash (cost=17.50..17.50 rows=3 width=72) (actual time=0.020..0.020 rows=1 loops=1)"
" -> Seq Scan on t (cost=0.00..17.50 rows=3 width=72) (actual time=0.008..0.016 rows=1 loops=1)"
" Filter: ((session_id)::text = '1'::text)"
"Total runtime: 1.799 ms"
PG 9.4,版本1
"HashAggregate (cost=63.54..63.55 rows=1 width=4) (actual time=11.393..11.394 rows=1 loops=1)"
" Group Key: t.id"
" -> Nested Loop (cost=19.96..63.54 rows=1 width=4) (actual time=0.223..11.387 rows=1 loops=1)"
" Join Filter: ((b.a_id = r.a_id) AND ((t.q_id)::text = (r.q_id)::text))"
" Rows Removed by Join Filter: 1155"
" -> Hash Join (cost=19.96..22.72 rows=1 width=44) (actual time=0.202..0.294 rows=34 loops=1)"
" Hash Cond: ((b.a_id = a.id) AND (b.study_id = t.study_id))"
" -> Seq Scan on b (cost=0.00..2.00 rows=100 width=8) (actual time=0.016..0.030 rows=100 loops=1)"
" -> Hash (cost=19.93..19.93 rows=2 width=44) (actual time=0.174..0.174 rows=34 loops=1)"
" Buckets: 1024 Batches: 1 Memory Usage: 2kB"
" -> Hash Join (cost=17.54..19.93 rows=2 width=44) (actual time=0.079..0.155 rows=34 loops=1)"
" Hash Cond: ((a.inst_id)::text = (t.inst_id)::text)"
" -> Seq Scan on a (cost=0.00..2.00 rows=100 width=6) (actual time=0.007..0.026 rows=100 loops=1)"
" -> Hash (cost=17.50..17.50 rows=3 width=72) (actual time=0.025..0.025 rows=1 loops=1)"
" Buckets: 1024 Batches: 1 Memory Usage: 1kB"
" -> Seq Scan on t (cost=0.00..17.50 rows=3 width=72) (actual time=0.012..0.021 rows=1 loops=1)"
" Filter: ((session_id)::text = '1'::text)"
" Rows Removed by Filter: 35"
" -> Seq Scan on r (cost=0.00..40.44 rows=25 width=7) (actual time=0.008..0.314 rows=34 loops=34)"
" Filter: ((a_id = ANY ('{1,2,3}'::integer[])) AND (((q_id)::text = ANY ('{q1,q2,q3}'::text[])) OR ((q_id)::text = ANY ('{q4,q5,q6}'::text[])) OR ((q_id)::text = ANY ('{q7,q8,q9}'::text[])) OR ((q_id)::text = ANY ('{q10,q11,q12}'::text[]))))"
" Rows Removed by Filter: 1164"
"Planning time: 0.856 ms"
"Execution time: 11.499 ms"
PG 9.4,版本2
"HashAggregate (cost=62.23..62.24 rows=1 width=4) (actual time=2.197..2.197 rows=1 loops=1)"
" Group Key: t.id"
" -> Nested Loop (cost=19.95..62.22 rows=1 width=4) (actual time=0.193..2.189 rows=1 loops=1)"
" Join Filter: ((b.a_id = r.a_id) AND ((a.inst_id)::text = (t.inst_id)::text) AND (b.study_id = t.study_id))"
" Rows Removed by Join Filter: 299"
" -> Hash Join (cost=17.54..56.68 rows=12 width=44) (actual time=0.065..1.761 rows=100 loops=1)"
" Hash Cond: ((r.q_id)::text = (t.q_id)::text)"
" -> Seq Scan on r (cost=0.00..35.95 rows=819 width=7) (actual time=0.030..1.271 rows=1198 loops=1)"
" Filter: (((q_id)::text = ANY ('{q1,q2,q3}'::text[])) OR ((q_id)::text = ANY ('{q4,q5,q6}'::text[])) OR ((q_id)::text = ANY ('{q7,q8,q9}'::text[])) OR ((q_id)::text = ANY ('{q10,q11,q12}'::text[])))"
" -> Hash (cost=17.50..17.50 rows=3 width=72) (actual time=0.022..0.022 rows=1 loops=1)"
" Buckets: 1024 Batches: 1 Memory Usage: 1kB"
" -> Seq Scan on t (cost=0.00..17.50 rows=3 width=72) (actual time=0.008..0.018 rows=1 loops=1)"
" Filter: ((session_id)::text = '1'::text)"
" Rows Removed by Filter: 35"
" -> Materialize (cost=2.41..4.83 rows=3 width=14) (actual time=0.001..0.003 rows=3 loops=100)"
" -> Hash Join (cost=2.41..4.82 rows=3 width=14) (actual time=0.119..0.172 rows=3 loops=1)"
" Hash Cond: (b.a_id = a.id)"
" -> Seq Scan on b (cost=0.00..2.00 rows=100 width=8) (actual time=0.007..0.028 rows=100 loops=1)"
" -> Hash (cost=2.38..2.38 rows=3 width=6) (actual time=0.064..0.064 rows=3 loops=1)"
" Buckets: 1024 Batches: 1 Memory Usage: 1kB"
" -> Seq Scan on a (cost=0.00..2.38 rows=3 width=6) (actual time=0.016..0.058 rows=3 loops=1)"
" Filter: (id = ANY ('{1,2,3}'::integer[]))"
" Rows Removed by Filter: 97"
"Planning time: 0.979 ms"
"Execution time: 2.309 ms"
我想明确表示非常感谢我提供的调优和数据建模建议。但是,这个示例是对系统范围问题的简化,我们希望找到一种方法,在不修改现有架构的情况下将性能恢复到升级到PG9.4之前的状态。希望这是不可能的。
答案 0 :(得分:1)
恕我直言,下面的查询要简单得多,至少要阅读。
EXPLAIN ANALYZE SELECT DISTINCT t.id
FROM t
INNER JOIN a ON a.inst_id = t.inst_id
INNER JOIN r ON r.a_id = a.id AND r.q_id = t.q_id
INNER JOIN b ON b.a_id = a.id AND b.study_id = t.study_id
WHERE t.session_id = '1'
AND r.a_id IN (1, 2, 3)
AND r.q_id IN ('q1', 'q2', 'q3'
,'q4', 'q5', 'q6'
,'q7', 'q8', 'q9'
,'q10', 'q11', 'q12')
;
PRIMARY KEY
约束将有很大帮助FOREIGN KEY
约束添加到引用的JOIN字段(以及引用字段的UNIQUE
约束)将有助于实现更多目标VACUUM ANALYZE
random_page_cost
降低到大约2.假设effective_cache_size和shared_buffers足够高。 (但是:在调整之前,让你的数据模型成形)