Question

使用Postgres 9.4，我通过非常简单的查询获得令人不安的性能损失。这是真正的障碍，阻碍了我的迁移。

注意：我的SQL Server 2014 DWH中不存在此问题。

这个想法很简单：计算dim_activity的元素，按product_key和country_key分组。但是，我想检索product_string和country_string而不是product_key和country_key。因此，我加入。

dim_activity有10M行
dim_country有20行
dim_box的行数少于1k
联接不会增加行数。
这些表位于不同的模式，但是表空间相同

QUERY 1

SELECT
    c.country_code,    
    b.real_box_sku,
    COUNT(*)
FROM "DWH_activity".dim_activity a
LEFT JOIN "DWH_main".dim_box b ON a.box_key=b.box_key
LEFT JOIN "DWH_main".dim_country c ON a.country_key=c.country_key
WHERE a.event_type IN('New Customer','Reactivation','Active','Simple')
GROUP BY
    c.country_code,
    b.real_box_sku

这需要超过 4分钟+ ，这是查询计划：

"GroupAggregate  (cost=1648008.80..1724929.95 rows=52200 width=25) (actual time=223523.127..271740.258 rows=425 loops=1)"
"  Output: c.country_code, b.real_box_sku, count(*)"
"  Group Key: c.country_code, b.real_box_sku"
"  Buffers: shared hit=2 read=77611, temp read=51224 written=51224"
"  ->  Sort  (cost=1648008.80..1667108.59 rows=7639915 width=25) (actual time=223518.029..269632.659 rows=7628149 loops=1)"
"        Output: c.country_code, b.real_box_sku"
"        Sort Key: c.country_code, b.real_box_sku"
"        Sort Method: external merge  Disk: 186416kB"
"        Buffers: shared hit=2 read=77611, temp read=51224 written=51224"
"        ->  Hash Left Join  (cost=59.51..408988.74 rows=7639915 width=25) (actual time=0.688..9803.950 rows=7628149 loops=1)"
"              Output: c.country_code, b.real_box_sku"
"              Hash Cond: (a.country_key = c.country_key)"
"              Buffers: shared hit=2 read=77611"
"              ->  Hash Left Join  (cost=35.79..303916.18 rows=7639915 width=15) (actual time=0.661..7129.092 rows=7628149 loops=1)"
"                    Output: a.country_key, b.real_box_sku"
"                    Hash Cond: (a.box_key = b.box_key)"
"                    Buffers: shared hit=2 read=77610"
"                    ->  Seq Scan on "DWH_activity".dim_activity a  (cost=0.00..198831.57 rows=7639915 width=6) (actual time=0.020..4032.800 rows=7628149 loops=1)"
"                          Output: a.country_key, a.vertical, a.cust_key, a.sub_key, a.fact_key, a.event_date_at, a.event_time_at, a.box_key, a.sub_type, a.activity_before, a.event_type, a.reason_key"
"                          Filter: ((a.event_type)::text = ANY ('{"New Customer",Reactivation,Active,Simple}'::text[]))"
"                          Rows Removed by Filter: 454422"
"                          Buffers: shared read=77593"
"                    ->  Hash  (cost=26.46..26.46 rows=746 width=17) (actual time=0.631..0.631 rows=746 loops=1)"
"                          Output: b.real_box_sku, b.box_key"
"                          Buckets: 1024  Batches: 1  Memory Usage: 37kB"
"                          Buffers: shared hit=2 read=17"
"                          ->  Seq Scan on "DWH_main".dim_box b  (cost=0.00..26.46 rows=746 width=17) (actual time=0.011..0.359 rows=746 loops=1)"
"                                Output: b.real_box_sku, b.box_key"
"                                Buffers: shared hit=2 read=17"
"              ->  Hash  (cost=16.10..16.10 rows=610 width=16) (actual time=0.019..0.019 rows=14 loops=1)"
"                    Output: c.country_code, c.country_key"
"                    Buckets: 1024  Batches: 1  Memory Usage: 1kB"
"                    Buffers: shared read=1"
"                    ->  Seq Scan on "DWH_main".dim_country c  (cost=0.00..16.10 rows=610 width=16) (actual time=0.009..0.013 rows=14 loops=1)"
"                          Output: c.country_code, c.country_key"
"                          Buffers: shared read=1"
"Planning time: 0.447 ms"
"Execution time: 271781.990 ms"

QUERY2 ：相同但只有一个JOIN（每个连接表上的完全相同的性能）

 SELECT
    c.country_code,
    COUNT(*)
FROM "DWH_activity".dim_activity a
LEFT JOIN "DWH_main".dim_country c ON a.country_key=c.country_key
GROUP BY
    c.country_code

这需要 5秒，这是查询计划：

"HashAggregate  (cost=309990.64..309992.64 rows=200 width=12) (actual time=5943.200..5943.200 rows=7 loops=1)"
"  Output: c.country_code, count(*)"
"  Group Key: c.country_code"
"  Buffers: shared read=77594"
"  ->  Hash Left Join  (cost=23.73..269577.79 rows=8082571 width=12) (actual time=0.037..3873.109 rows=8082571 loops=1)"
"        Output: c.country_code"
"        Hash Cond: (a.country_key = c.country_key)"
"        Buffers: shared read=77594"
"        ->  Seq Scan on "DWH_activity".dim_activity a  (cost=0.00..158418.71 rows=8082571 width=2) (actual time=0.016..1261.439 rows=8082571 loops=1)"
"              Output: a.country_key, a.vertical, a.cust_key, a.sub_key, a.fact_key, a.event_date_at, a.event_time_at, a.box_key, a.sub_type, a.activity_before, a.event_type, a.reason_key"
"              Buffers: shared read=77593"
"        ->  Hash  (cost=16.10..16.10 rows=610 width=16) (actual time=0.013..0.013 rows=14 loops=1)"
"              Output: c.country_code, c.country_key"
"              Buckets: 1024  Batches: 1  Memory Usage: 1kB"
"              Buffers: shared read=1"
"              ->  Seq Scan on "DWH_main".dim_country c  (cost=0.00..16.10 rows=610 width=16) (actual time=0.006..0.011 rows=14 loops=1)"
"                    Output: c.country_code, c.country_key"
"                    Buffers: shared read=1"
"Planning time: 0.140 ms"
"Execution time: 5943.249 ms"

QUERY3 ：我先进行计数，然后加入汇总

SELECT
    c.country_code,    
    b.real_box_sku,
    COUNT(*)
FROM (
    SELECT
    country_key,
    box_key,
        COUNT(*)
    FROM "DWH_activity".dim_activity a
    GROUP BY
    country_key,
    box_key
) a
LEFT JOIN "DWH_main".dim_box b ON a.box_key=b.box_key
LEFT JOIN "DWH_main".dim_country c ON a.country_key=c.country_key
GROUP BY
    c.country_code,
    b.real_box_sku

这需要 3秒，这是查询计划：

"HashAggregate  (cost=219263.82..219294.06 rows=3024 width=25) (actual time=3990.415..3990.492 rows=425 loops=1)"
"  Output: c.country_code, b.real_box_sku, count(*)"
"  Group Key: c.country_code, b.real_box_sku"
"  Buffers: shared hit=35 read=77578"
"  ->  Hash Left Join  (cost=219097.50..219241.14 rows=3024 width=25) (actual time=3989.832..3990.232 rows=440 loops=1)"
"        Output: b.real_box_sku, c.country_code"
"        Hash Cond: (a.country_key = c.country_key)"
"        Buffers: shared hit=35 read=77578"
"        ->  Hash Left Join  (cost=219073.78..219175.84 rows=3024 width=15) (actual time=3989.815..3990.073 rows=440 loops=1)"
"              Output: a.country_key, b.real_box_sku"
"              Hash Cond: (a.box_key = b.box_key)"
"              Buffers: shared hit=34 read=77578"
"              ->  HashAggregate  (cost=219037.99..219068.23 rows=3024 width=6) (actual time=3989.414..3989.508 rows=440 loops=1)"
"                    Output: a.country_key, a.box_key, count(*)"
"                    Group Key: a.country_key, a.box_key"
"                    Buffers: shared hit=32 read=77561"
"                    ->  Seq Scan on "DWH_activity".dim_activity a  (cost=0.00..158418.71 rows=8082571 width=6) (actual time=0.024..1115.551 rows=8082571 loops=1)"
"                          Output: a.country_key, a.vertical, a.cust_key, a.sub_key, a.fact_key, a.event_date_at, a.event_time_at, a.box_key, a.sub_type, a.activity_before, a.event_type, a.reason_key"
"                          Buffers: shared hit=32 read=77561"
"              ->  Hash  (cost=26.46..26.46 rows=746 width=17) (actual time=0.378..0.378 rows=746 loops=1)"
"                    Output: b.real_box_sku, b.box_key"
"                    Buckets: 1024  Batches: 1  Memory Usage: 37kB"
"                    Buffers: shared hit=2 read=17"
"                    ->  Seq Scan on "DWH_main".dim_box b  (cost=0.00..26.46 rows=746 width=17) (actual time=0.011..0.210 rows=746 loops=1)"
"                          Output: b.real_box_sku, b.box_key"
"                          Buffers: shared hit=2 read=17"
"        ->  Hash  (cost=16.10..16.10 rows=610 width=16) (actual time=0.010..0.010 rows=14 loops=1)"
"              Output: c.country_code, c.country_key"
"              Buckets: 1024  Batches: 1  Memory Usage: 1kB"
"              Buffers: shared hit=1"
"              ->  Seq Scan on "DWH_main".dim_country c  (cost=0.00..16.10 rows=610 width=16) (actual time=0.003..0.006 rows=14 loops=1)"
"                    Output: c.country_code, c.country_key"
"                    Buffers: shared hit=1"
"Planning time: 0.220 ms"
"Execution time: 3990.589 ms"

因此，看起来聚合多个连接会使我的查询无法运行

因此，我们的大多数报告都必须折旧或无法运行......

感谢您的帮助;）

EDIT1：我已经用explain的输出替换了查询plain analyze（analyze，verbose，buffers，跟随a_horse_with_no_name的请求

Postgres中的小查询（但不是SQLServ）导致性能大幅下降

0 个答案: