为什么PostgreSQL查询规划器没有折叠视图?

时间:2016-03-07 20:57:04

标签: postgresql

我在Postgres 9.5 DB中有一个视图定义为:

CREATE OR REPLACE VIEW athlete_friends AS
SELECT CASE WHEN GROUPING(a0.data->>'type') > 0 THEN '' ELSE a0.data->>'type' END AS type,
       a0.athlete AS athlete_id,
       (ath1.data->>'firstname') || ' ' || (ath1.data->>'lastname') AS friend,
       SUM((a0.data->>'distance')::float) / 1000 AS km,
       SUM((a0.data->>'distance')::float) / 1609.3435021075907 AS miles
FROM   relationships r,
       activities a0,
       activities a1,
       athletes ath0,
       athletes ath1
WHERE  r.a = a0.id
AND    r.b = a1.id
AND    a0.athlete = ath0.id
AND    a1.athlete = ath1.id
GROUP  BY athlete_id, friend, ROLLUP(a0.data->>'type')
ORDER  BY km DESC
;

如果我EXPLAIN ANALYZE对此视图进行以下查询

EXPLAIN ANALYZE
SELECT * FROM athlete_friends
WHERE athlete_id = 164303 AND type = 'Run'
;

我从查询规划器得到以下输出:

                                                                                    QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Subquery Scan on athlete_friends  (cost=787888.02..792007.94 rows=329594 width=84) (actual time=6319.545..6320.870 rows=2470 loops=1)
   ->  Sort  (cost=787888.02..788712.00 rows=329594 width=2040) (actual time=6319.542..6320.002 rows=2470 loops=1)
         Sort Key: ((sum(((a0.data ->> 'distance'::text))::double precision) / '1000'::double precision)) DESC
         Sort Method: quicksort  Memory: 241kB
         ->  GroupAggregate  (cost=446430.06..467029.68 rows=329594 width=2040) (actual time=5265.129..6317.604 rows=2470 loops=1)
               Group Key: a0.athlete, ((((ath1.data ->> 'firstname'::text) || ' '::text) || (ath1.data ->> 'lastname'::text))), ((a0.data ->> 'type'::text))
               Group Key: a0.athlete, ((((ath1.data ->> 'firstname'::text) || ' '::text) || (ath1.data ->> 'lastname'::text)))
               Filter: ((a0.athlete = 164303) AND (CASE WHEN (GROUPING(((a0.data ->> 'type'::text))) > 0) THEN ''::text ELSE ((a0.data ->> 'type'::text)) END = 'Run'::text))
               Rows Removed by Filter: 165599
               ->  Sort  (cost=446430.06..446842.05 rows=164797 width=2040) (actual time=5173.861..5794.534 rows=168380 loops=1)
                     Sort Key: a0.athlete, ((((ath1.data ->> 'firstname'::text) || ' '::text) || (ath1.data ->> 'lastname'::text))), ((a0.data ->> 'type'::text))
                     Sort Method: external merge  Disk: 250184kB
                     ->  Hash Join  (cost=192910.52..286823.12 rows=164797 width=2040) (actual time=1835.459..3922.630 rows=168380 loops=1)
                           Hash Cond: (a0.athlete = ath0.id)
                           ->  Hash Join  (cost=190990.48..280576.80 rows=164892 width=2040) (actual time=1775.364..3137.595 rows=168380 loops=1)
                                 Hash Cond: (r.a = a0.id)
                                 ->  Hash Join  (cost=73519.18..85161.25 rows=164892 width=553) (actual time=437.949..846.261 rows=168380 loops=1)
                                       Hash Cond: (a1.athlete = ath1.id)
                                       ->  Hash Join  (cost=65144.29..69902.73 rows=164987 width=8) (actual time=342.602..560.793 rows=168380 loops=1)
                                             Hash Cond: (r.b = a1.id)
                                             ->  Seq Scan on relationships r  (cost=0.00..2489.87 rows=164987 width=8) (actual time=0.008..39.116 rows=168380 loops=1)
                                             ->  Hash  (cost=61619.13..61619.13 rows=282013 width=8) (actual time=340.980..340.980 rows=288504 loops=1)
                                                   Buckets: 524288  Batches: 1  Memory Usage: 9937kB
                                                   ->  Seq Scan on activities a1  (cost=0.00..61619.13 rows=282013 width=8) (actual time=0.008..207.661 rows=288504 loops=1)
                                       ->  Hash  (cost=4461.73..4461.73 rows=46973 width=553) (actual time=95.096..95.096 rows=47527 loops=1)
                                             Buckets: 32768  Batches: 2  Memory Usage: 13651kB
                                             ->  Seq Scan on athletes ath1  (cost=0.00..4461.73 rows=46973 width=553) (actual time=0.004..23.909 rows=47527 loops=1)
                                 ->  Hash  (cost=61619.13..61619.13 rows=282013 width=1495) (actual time=1337.246..1337.246 rows=288504 loops=1)
                                       Buckets: 16384  Batches: 32  Memory Usage: 13931kB
                                       ->  Seq Scan on activities a0  (cost=0.00..61619.13 rows=282013 width=1495) (actual time=0.006..242.787 rows=288504 loops=1)
                           ->  Hash  (cost=1332.88..1332.88 rows=46973 width=4) (actual time=59.890..59.890 rows=47527 loops=1)
                                 Buckets: 65536  Batches: 1  Memory Usage: 1370kB
                                 ->  Index Only Scan using athletes_pkey on athletes ath0  (cost=0.29..1332.88 rows=46973 width=4) (actual time=0.032..39.125 rows=47527 loops=1)
                                       Heap Fetches: 14978
 Planning time: 6.831 ms
 Execution time: 6369.308 ms
(36 rows)

另一方面,如果我将视图定义复制为子查询并且将where子句添加到join子句(编辑:并删除由where子句选择的组级别),这样:

EXPLAIN ANALYZE
SELECT a0.data->>'type',
       a0.athlete AS athlete_id,
       (ath1.data->>'firstname') || ' ' || (ath1.data->>'lastname') AS friend,
       SUM((a0.data->>'distance')::float) / 1000 AS km,
       SUM((a0.data->>'distance')::float) / 1609.3435021075907 AS miles
FROM   relationships r,
       activities a0,
       activities a1,
       athletes ath0,
       athletes ath1
WHERE  r.a = a0.id
AND    r.b = a1.id
AND    a0.athlete = ath0.id
AND    a0.data->>'type' = 'Ride'
AND    a0.athlete = 164303
AND    a1.athlete = ath1.id
GROUP  BY athlete_id, friend, a0.data->>'type'
ORDER  BY km DESC
;

导致很多改进的查询计划:

                                                                                    QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=63866.10..63866.13 rows=12 width=2040) (actual time=153.723..153.724 rows=11 loops=1)
   Sort Key: ((sum(((a0.data ->> 'distance'::text))::double precision) / '1000'::double precision)) DESC
   Sort Method: quicksort  Memory: 17kB
   ->  HashAggregate  (cost=63865.56..63865.89 rows=12 width=2040) (actual time=153.678..153.684 rows=11 loops=1)
         Group Key: a0.athlete, (((ath1.data ->> 'firstname'::text) || ' '::text) || (ath1.data ->> 'lastname'::text)), (a0.data ->> 'type'::text)
         ->  Nested Loop  (cost=1.42..63865.23 rows=12 width=2040) (actual time=67.977..153.513 rows=36 loops=1)
               ->  Nested Loop  (cost=1.13..63860.56 rows=12 width=1495) (actual time=67.934..153.050 rows=36 loops=1)
                     ->  Index Only Scan using athletes_id_int4_idx on athletes ath0  (cost=0.29..8.31 rows=1 width=4) (actual time=0.015..0.021 rows=1 loops=1)
                           Index Cond: (id = 164303)
                           Heap Fetches: 1
                     ->  Nested Loop  (cost=0.84..63852.14 rows=12 width=1495) (actual time=67.916..153.008 rows=36 loops=1)
                           ->  Nested Loop  (cost=0.42..63829.38 rows=12 width=1495) (actual time=67.902..152.739 rows=36 loops=1)
                                 ->  Seq Scan on activities a0  (cost=0.00..63734.23 rows=20 width=1495) (actual time=61.130..151.445 rows=469 loops=1)
                                       Filter: ((athlete = 164303) AND ((data ->> 'type'::text) = 'Ride'::text))
                                       Rows Removed by Filter: 288035
                                 ->  Index Only Scan using relationships_a_b_idx on relationships r  (cost=0.42..4.71 rows=5 width=8) (actual time=0.002..0.002 rows=0 loops=469)
                                       Index Cond: (a = a0.id)
                                       Heap Fetches: 0
                           ->  Index Scan using activities_pkey on activities a1  (cost=0.42..1.89 rows=1 width=8) (actual time=0.006..0.007 rows=1 loops=36)
                                 Index Cond: (id = r.b)
               ->  Index Scan using athletes_pkey on athletes ath1  (cost=0.29..0.37 rows=1 width=553) (actual time=0.004..0.004 rows=1 loops=36)
                     Index Cond: (id = a1.athlete)
 Planning time: 4.085 ms
 Execution time: 153.821 ms
(24 rows)

据我所知,查询规划器应该将视图折叠起来并基本上进行第二次查询,但它似乎并没有这样做。问题是为什么?我的非默认postgresql.conf参数是:

data_directory = '/var/lib/postgresql/9.5/main'         # use data in another directory
hba_file = '/etc/postgresql/9.5/main/pg_hba.conf'       # host-based authentication file
ident_file = '/etc/postgresql/9.5/main/pg_ident.conf'   # ident configuration file
external_pid_file = '/var/run/postgresql/9.5-main.pid'                  # write an extra PID file
port = 5432                             # (change requires restart)
max_connections = 100                   # (change requires restart)
unix_socket_directories = '/var/run/postgresql' # comma-separated list of directories
ssl = true                              # (change requires restart)
ssl_cert_file = '/etc/ssl/certs/xxxxx.pem'          # (change requires restart)
ssl_key_file = '/etc/ssl/private/xxxxxx.key'         # (change requires restart)
shared_buffers = 512MB                  # min 128kB
work_mem = 16MB                         # min 64kB
dynamic_shared_memory_type = posix      # the default is the first option
effective_cache_size = 1GB
default_statistics_target = 10000       # range 1-10000
log_min_duration_statement = 500        # -1 is disabled, 0 logs all statements
log_line_prefix = '%t [%p-%l] %q%u@%d '                 # special values:
log_timezone = 'localtime'
stats_temp_directory = '/var/run/postgresql/9.5-main.pg_stat_tmp'
datestyle = 'iso, mdy'
timezone = 'localtime'
lc_messages = 'en_US.UTF-8'                     # locale for system error message
lc_monetary = 'en_US.UTF-8'                     # locale for monetary formatting
lc_numeric = 'en_US.UTF-8'                      # locale for number formatting
lc_time = 'en_US.UTF-8'                         # locale for time formatting
default_text_search_config = 'pg_catalog.english'

作为最后一点,如果我将数据库复制到另一台机器(OS X而不是Ubuntu),我会看到第一种情况的查询计划:

                                                                                    QUERY PLAN                                                                                     
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Subquery Scan on athlete_friends  (cost=67543.34..67602.07 rows=4698 width=84) (actual time=346.760..347.257 rows=2470 loops=1)
   ->  Sort  (cost=67543.34..67555.09 rows=4698 width=2044) (actual time=346.758..346.912 rows=2470 loops=1)
         Sort Key: ((sum(((a0.data ->> 'distance'::text))::double precision) / '1000'::double precision)) DESC
         Sort Method: quicksort  Memory: 310kB
         ->  GroupAggregate  (cost=62801.69..63095.31 rows=4698 width=2044) (actual time=321.670..345.509 rows=2470 loops=1)
               Group Key: a0.athlete, ((((ath1.data ->> 'firstname'::text) || ' '::text) || (ath1.data ->> 'lastname'::text))), ((a0.data ->> 'type'::text))
               Group Key: a0.athlete, ((((ath1.data ->> 'firstname'::text) || ' '::text) || (ath1.data ->> 'lastname'::text)))
               Filter: ((a0.athlete = 164303) AND (CASE WHEN (GROUPING(((a0.data ->> 'type'::text))) > 0) THEN ''::text ELSE ((a0.data ->> 'type'::text)) END = 'Run'::text))
               Rows Removed by Filter: 2486
               ->  Sort  (cost=62801.69..62807.56 rows=2349 width=2044) (actual time=321.498..329.618 rows=6153 loops=1)
                     Sort Key: a0.athlete, ((((ath1.data ->> 'firstname'::text) || ' '::text) || (ath1.data ->> 'lastname'::text))), ((a0.data ->> 'type'::text))
                     Sort Method: external merge  Disk: 9344kB
                     ->  Nested Loop  (cost=50986.34..60587.67 rows=2349 width=2044) (actual time=140.392..233.788 rows=6153 loops=1)
                           ->  Nested Loop  (cost=50986.05..59685.76 rows=2349 width=1499) (actual time=140.380..199.376 rows=6153 loops=1)
                                 ->  Index Only Scan using athletes_id_int4_idx on athletes ath0  (cost=0.29..8.31 rows=1 width=4) (actual time=0.005..0.007 rows=1 loops=1)
                                       Index Cond: (id = 164303)
                                       Heap Fetches: 1
                                 ->  Nested Loop  (cost=50985.76..59653.96 rows=2349 width=1499) (actual time=140.372..198.520 rows=6153 loops=1)
                                       ->  Hash Join  (cost=50985.34..55404.22 rows=2349 width=1499) (actual time=140.360..182.452 rows=6153 loops=1)
                                             Hash Cond: (r.a = a0.id)
                                             ->  Seq Scan on relationships r  (cost=0.00..2029.83 rows=140683 width=8) (actual time=0.005..9.589 rows=140684 loops=1)
                                             ->  Hash  (cost=50197.93..50197.93 rows=3953 width=1499) (actual time=137.099..137.099 rows=4045 loops=1)
                                                   Buckets: 4096  Batches: 2  Memory Usage: 3019kB
                                                   ->  Seq Scan on activities a0  (cost=0.00..50197.93 rows=3953 width=1499) (actual time=1.886..129.948 rows=4045 loops=1)
                                                         Filter: (athlete = 164303)
                                                         Rows Removed by Filter: 232401
                                       ->  Index Scan using activities_id_int4_idx on activities a1  (cost=0.42..1.80 rows=1 width=8) (actual time=0.002..0.002 rows=1 loops=6153)
                                             Index Cond: (id = r.b)
                           ->  Index Scan using athletes_id_int4_idx on athletes ath1  (cost=0.29..0.36 rows=1 width=553) (actual time=0.001..0.002 rows=1 loops=6153)
                                 Index Cond: (id = a1.athlete)
 Planning time: 0.575 ms
 Execution time: 363.249 ms
(32 rows)

这与第一个计划之间的关键区别似乎是计划者正在计算过滤运动员= 164303'切断表格的部分使用了很多。我已经在相关表格上进行了真空/分析,但它没有帮助。

0 个答案:

没有答案