我在Postgres 9.5 DB中有一个视图定义为:
CREATE OR REPLACE VIEW athlete_friends AS
SELECT CASE WHEN GROUPING(a0.data->>'type') > 0 THEN '' ELSE a0.data->>'type' END AS type,
a0.athlete AS athlete_id,
(ath1.data->>'firstname') || ' ' || (ath1.data->>'lastname') AS friend,
SUM((a0.data->>'distance')::float) / 1000 AS km,
SUM((a0.data->>'distance')::float) / 1609.3435021075907 AS miles
FROM relationships r,
activities a0,
activities a1,
athletes ath0,
athletes ath1
WHERE r.a = a0.id
AND r.b = a1.id
AND a0.athlete = ath0.id
AND a1.athlete = ath1.id
GROUP BY athlete_id, friend, ROLLUP(a0.data->>'type')
ORDER BY km DESC
;
如果我EXPLAIN ANALYZE
对此视图进行以下查询
EXPLAIN ANALYZE
SELECT * FROM athlete_friends
WHERE athlete_id = 164303 AND type = 'Run'
;
我从查询规划器得到以下输出:
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Subquery Scan on athlete_friends (cost=787888.02..792007.94 rows=329594 width=84) (actual time=6319.545..6320.870 rows=2470 loops=1)
-> Sort (cost=787888.02..788712.00 rows=329594 width=2040) (actual time=6319.542..6320.002 rows=2470 loops=1)
Sort Key: ((sum(((a0.data ->> 'distance'::text))::double precision) / '1000'::double precision)) DESC
Sort Method: quicksort Memory: 241kB
-> GroupAggregate (cost=446430.06..467029.68 rows=329594 width=2040) (actual time=5265.129..6317.604 rows=2470 loops=1)
Group Key: a0.athlete, ((((ath1.data ->> 'firstname'::text) || ' '::text) || (ath1.data ->> 'lastname'::text))), ((a0.data ->> 'type'::text))
Group Key: a0.athlete, ((((ath1.data ->> 'firstname'::text) || ' '::text) || (ath1.data ->> 'lastname'::text)))
Filter: ((a0.athlete = 164303) AND (CASE WHEN (GROUPING(((a0.data ->> 'type'::text))) > 0) THEN ''::text ELSE ((a0.data ->> 'type'::text)) END = 'Run'::text))
Rows Removed by Filter: 165599
-> Sort (cost=446430.06..446842.05 rows=164797 width=2040) (actual time=5173.861..5794.534 rows=168380 loops=1)
Sort Key: a0.athlete, ((((ath1.data ->> 'firstname'::text) || ' '::text) || (ath1.data ->> 'lastname'::text))), ((a0.data ->> 'type'::text))
Sort Method: external merge Disk: 250184kB
-> Hash Join (cost=192910.52..286823.12 rows=164797 width=2040) (actual time=1835.459..3922.630 rows=168380 loops=1)
Hash Cond: (a0.athlete = ath0.id)
-> Hash Join (cost=190990.48..280576.80 rows=164892 width=2040) (actual time=1775.364..3137.595 rows=168380 loops=1)
Hash Cond: (r.a = a0.id)
-> Hash Join (cost=73519.18..85161.25 rows=164892 width=553) (actual time=437.949..846.261 rows=168380 loops=1)
Hash Cond: (a1.athlete = ath1.id)
-> Hash Join (cost=65144.29..69902.73 rows=164987 width=8) (actual time=342.602..560.793 rows=168380 loops=1)
Hash Cond: (r.b = a1.id)
-> Seq Scan on relationships r (cost=0.00..2489.87 rows=164987 width=8) (actual time=0.008..39.116 rows=168380 loops=1)
-> Hash (cost=61619.13..61619.13 rows=282013 width=8) (actual time=340.980..340.980 rows=288504 loops=1)
Buckets: 524288 Batches: 1 Memory Usage: 9937kB
-> Seq Scan on activities a1 (cost=0.00..61619.13 rows=282013 width=8) (actual time=0.008..207.661 rows=288504 loops=1)
-> Hash (cost=4461.73..4461.73 rows=46973 width=553) (actual time=95.096..95.096 rows=47527 loops=1)
Buckets: 32768 Batches: 2 Memory Usage: 13651kB
-> Seq Scan on athletes ath1 (cost=0.00..4461.73 rows=46973 width=553) (actual time=0.004..23.909 rows=47527 loops=1)
-> Hash (cost=61619.13..61619.13 rows=282013 width=1495) (actual time=1337.246..1337.246 rows=288504 loops=1)
Buckets: 16384 Batches: 32 Memory Usage: 13931kB
-> Seq Scan on activities a0 (cost=0.00..61619.13 rows=282013 width=1495) (actual time=0.006..242.787 rows=288504 loops=1)
-> Hash (cost=1332.88..1332.88 rows=46973 width=4) (actual time=59.890..59.890 rows=47527 loops=1)
Buckets: 65536 Batches: 1 Memory Usage: 1370kB
-> Index Only Scan using athletes_pkey on athletes ath0 (cost=0.29..1332.88 rows=46973 width=4) (actual time=0.032..39.125 rows=47527 loops=1)
Heap Fetches: 14978
Planning time: 6.831 ms
Execution time: 6369.308 ms
(36 rows)
另一方面,如果我将视图定义复制为子查询并且将where子句添加到join子句(编辑:并删除由where子句选择的组级别),这样:
EXPLAIN ANALYZE
SELECT a0.data->>'type',
a0.athlete AS athlete_id,
(ath1.data->>'firstname') || ' ' || (ath1.data->>'lastname') AS friend,
SUM((a0.data->>'distance')::float) / 1000 AS km,
SUM((a0.data->>'distance')::float) / 1609.3435021075907 AS miles
FROM relationships r,
activities a0,
activities a1,
athletes ath0,
athletes ath1
WHERE r.a = a0.id
AND r.b = a1.id
AND a0.athlete = ath0.id
AND a0.data->>'type' = 'Ride'
AND a0.athlete = 164303
AND a1.athlete = ath1.id
GROUP BY athlete_id, friend, a0.data->>'type'
ORDER BY km DESC
;
导致很多改进的查询计划:
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=63866.10..63866.13 rows=12 width=2040) (actual time=153.723..153.724 rows=11 loops=1)
Sort Key: ((sum(((a0.data ->> 'distance'::text))::double precision) / '1000'::double precision)) DESC
Sort Method: quicksort Memory: 17kB
-> HashAggregate (cost=63865.56..63865.89 rows=12 width=2040) (actual time=153.678..153.684 rows=11 loops=1)
Group Key: a0.athlete, (((ath1.data ->> 'firstname'::text) || ' '::text) || (ath1.data ->> 'lastname'::text)), (a0.data ->> 'type'::text)
-> Nested Loop (cost=1.42..63865.23 rows=12 width=2040) (actual time=67.977..153.513 rows=36 loops=1)
-> Nested Loop (cost=1.13..63860.56 rows=12 width=1495) (actual time=67.934..153.050 rows=36 loops=1)
-> Index Only Scan using athletes_id_int4_idx on athletes ath0 (cost=0.29..8.31 rows=1 width=4) (actual time=0.015..0.021 rows=1 loops=1)
Index Cond: (id = 164303)
Heap Fetches: 1
-> Nested Loop (cost=0.84..63852.14 rows=12 width=1495) (actual time=67.916..153.008 rows=36 loops=1)
-> Nested Loop (cost=0.42..63829.38 rows=12 width=1495) (actual time=67.902..152.739 rows=36 loops=1)
-> Seq Scan on activities a0 (cost=0.00..63734.23 rows=20 width=1495) (actual time=61.130..151.445 rows=469 loops=1)
Filter: ((athlete = 164303) AND ((data ->> 'type'::text) = 'Ride'::text))
Rows Removed by Filter: 288035
-> Index Only Scan using relationships_a_b_idx on relationships r (cost=0.42..4.71 rows=5 width=8) (actual time=0.002..0.002 rows=0 loops=469)
Index Cond: (a = a0.id)
Heap Fetches: 0
-> Index Scan using activities_pkey on activities a1 (cost=0.42..1.89 rows=1 width=8) (actual time=0.006..0.007 rows=1 loops=36)
Index Cond: (id = r.b)
-> Index Scan using athletes_pkey on athletes ath1 (cost=0.29..0.37 rows=1 width=553) (actual time=0.004..0.004 rows=1 loops=36)
Index Cond: (id = a1.athlete)
Planning time: 4.085 ms
Execution time: 153.821 ms
(24 rows)
据我所知,查询规划器应该将视图折叠起来并基本上进行第二次查询,但它似乎并没有这样做。问题是为什么?我的非默认postgresql.conf参数是:
data_directory = '/var/lib/postgresql/9.5/main' # use data in another directory
hba_file = '/etc/postgresql/9.5/main/pg_hba.conf' # host-based authentication file
ident_file = '/etc/postgresql/9.5/main/pg_ident.conf' # ident configuration file
external_pid_file = '/var/run/postgresql/9.5-main.pid' # write an extra PID file
port = 5432 # (change requires restart)
max_connections = 100 # (change requires restart)
unix_socket_directories = '/var/run/postgresql' # comma-separated list of directories
ssl = true # (change requires restart)
ssl_cert_file = '/etc/ssl/certs/xxxxx.pem' # (change requires restart)
ssl_key_file = '/etc/ssl/private/xxxxxx.key' # (change requires restart)
shared_buffers = 512MB # min 128kB
work_mem = 16MB # min 64kB
dynamic_shared_memory_type = posix # the default is the first option
effective_cache_size = 1GB
default_statistics_target = 10000 # range 1-10000
log_min_duration_statement = 500 # -1 is disabled, 0 logs all statements
log_line_prefix = '%t [%p-%l] %q%u@%d ' # special values:
log_timezone = 'localtime'
stats_temp_directory = '/var/run/postgresql/9.5-main.pg_stat_tmp'
datestyle = 'iso, mdy'
timezone = 'localtime'
lc_messages = 'en_US.UTF-8' # locale for system error message
lc_monetary = 'en_US.UTF-8' # locale for monetary formatting
lc_numeric = 'en_US.UTF-8' # locale for number formatting
lc_time = 'en_US.UTF-8' # locale for time formatting
default_text_search_config = 'pg_catalog.english'
作为最后一点,如果我将数据库复制到另一台机器(OS X而不是Ubuntu),我会看到第一种情况的查询计划:
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Subquery Scan on athlete_friends (cost=67543.34..67602.07 rows=4698 width=84) (actual time=346.760..347.257 rows=2470 loops=1)
-> Sort (cost=67543.34..67555.09 rows=4698 width=2044) (actual time=346.758..346.912 rows=2470 loops=1)
Sort Key: ((sum(((a0.data ->> 'distance'::text))::double precision) / '1000'::double precision)) DESC
Sort Method: quicksort Memory: 310kB
-> GroupAggregate (cost=62801.69..63095.31 rows=4698 width=2044) (actual time=321.670..345.509 rows=2470 loops=1)
Group Key: a0.athlete, ((((ath1.data ->> 'firstname'::text) || ' '::text) || (ath1.data ->> 'lastname'::text))), ((a0.data ->> 'type'::text))
Group Key: a0.athlete, ((((ath1.data ->> 'firstname'::text) || ' '::text) || (ath1.data ->> 'lastname'::text)))
Filter: ((a0.athlete = 164303) AND (CASE WHEN (GROUPING(((a0.data ->> 'type'::text))) > 0) THEN ''::text ELSE ((a0.data ->> 'type'::text)) END = 'Run'::text))
Rows Removed by Filter: 2486
-> Sort (cost=62801.69..62807.56 rows=2349 width=2044) (actual time=321.498..329.618 rows=6153 loops=1)
Sort Key: a0.athlete, ((((ath1.data ->> 'firstname'::text) || ' '::text) || (ath1.data ->> 'lastname'::text))), ((a0.data ->> 'type'::text))
Sort Method: external merge Disk: 9344kB
-> Nested Loop (cost=50986.34..60587.67 rows=2349 width=2044) (actual time=140.392..233.788 rows=6153 loops=1)
-> Nested Loop (cost=50986.05..59685.76 rows=2349 width=1499) (actual time=140.380..199.376 rows=6153 loops=1)
-> Index Only Scan using athletes_id_int4_idx on athletes ath0 (cost=0.29..8.31 rows=1 width=4) (actual time=0.005..0.007 rows=1 loops=1)
Index Cond: (id = 164303)
Heap Fetches: 1
-> Nested Loop (cost=50985.76..59653.96 rows=2349 width=1499) (actual time=140.372..198.520 rows=6153 loops=1)
-> Hash Join (cost=50985.34..55404.22 rows=2349 width=1499) (actual time=140.360..182.452 rows=6153 loops=1)
Hash Cond: (r.a = a0.id)
-> Seq Scan on relationships r (cost=0.00..2029.83 rows=140683 width=8) (actual time=0.005..9.589 rows=140684 loops=1)
-> Hash (cost=50197.93..50197.93 rows=3953 width=1499) (actual time=137.099..137.099 rows=4045 loops=1)
Buckets: 4096 Batches: 2 Memory Usage: 3019kB
-> Seq Scan on activities a0 (cost=0.00..50197.93 rows=3953 width=1499) (actual time=1.886..129.948 rows=4045 loops=1)
Filter: (athlete = 164303)
Rows Removed by Filter: 232401
-> Index Scan using activities_id_int4_idx on activities a1 (cost=0.42..1.80 rows=1 width=8) (actual time=0.002..0.002 rows=1 loops=6153)
Index Cond: (id = r.b)
-> Index Scan using athletes_id_int4_idx on athletes ath1 (cost=0.29..0.36 rows=1 width=553) (actual time=0.001..0.002 rows=1 loops=6153)
Index Cond: (id = a1.athlete)
Planning time: 0.575 ms
Execution time: 363.249 ms
(32 rows)
这与第一个计划之间的关键区别似乎是计划者正在计算过滤运动员= 164303'切断表格的部分使用了很多。我已经在相关表格上进行了真空/分析,但它没有帮助。