排序(合并连接之前)如何增加行数?

时间:2018-02-01 19:04:27

标签: sql postgresql query-optimization sql-execution-plan

我正在处理一个表现非常糟糕的查询:

SELECT COUNT(*)
FROM ps 
INNER JOIN p ON p.id = ps.patient_id 
INNER JOIN hh ON hh.id = ps.hh_id 
INNER JOIN cma ON cma.id = ps.cma_id 
INNER JOIN ter ters ON ( p.mm_id = ters.member_id ) 
    AND ( hh.mmis_id = ters.hh_mmis_id ) 
    AND ( cma.mmis_id = ters.cma_mmis_id ) 
    AND ( ps.start_date = ters.begin_date ) 
    AND ( CASE WHEN ps.oe_id = 1 THEN 'O' WHEN ps.oe_id = 2 THEN 'E' ELSE 'UNKNOWN_oe_id' END = ters.outreach_enrollment_code ) 
WHERE ters.status != 'Canceled' AND hh.id = 1;

并且在查询计划中我注意到排序节点(在合并连接之前)正在发出比节点接收的更多行作为输入。这真的让我的心理模型感到困惑,我错过了什么?

以下是相关查询计划的摘要:

->  Sort  (cost=20956.81..21259.78 rows=121187 width=20) (actual time=140.260..3363.612 rows=29930138 loops=1)
    Output: ps.p_id, ps.hh_id, ps.cma_id, ps.start_date, ps.oe_code_id, (CASE WHEN (ps.oe_code_id = 1) THEN 'O'::text WHEN (ps.oe_code_id = 2) THEN 'E'::text ELSE 'UNKNOWN_oe_code_id'::text END)
    Sort Key: ps.start_date, ps.cma_id, (CASE WHEN (ps.oe_code_id = 1) THEN 'O'::text WHEN (ps.oe_code_id = 2) THEN 'E'::text ELSE 'UNKNOWN_oe_code_id'::text END)
    Sort Method: quicksort  Memory: 12708kB
    Buffers: shared hit=4983
    ->  Bitmap Heap Scan on public.ps  (cost=2275.62..10724.46 rows=121187 width=20) (actual time=8.833..58.231 rows=123338 loops=1)
          Output: ps.p_id, ps.hh_id, ps.cma_id, ps.start_date, ps.oe_code_id, CASE WHEN (ps.oe_code_id = 1) THEN 'O'::text WHEN (ps.oe_code_id = 2) THEN 'E'::text ELSE 'UNKNOWN_oe_code_id'::text END
          Recheck Cond: (ps.hh_id = 1)
          Heap Blocks: exact=4644
          Buffers: shared hit=4983
          ->  Bitmap Index Scan on index_ps_on_hh_id  (cost=0.00..2245.33 rows=121187 width=0) (actual time=8.138..8.138 rows=123338 loops=1)
                Index Cond: (ps.hh_id = 1)
                Buffers: shared hit=339

请注意,位图堆扫描会发出123,338行,然后排序会发出29,930,138!

人们要求提供完整的查询计划:

Aggregate  (cost=67207.10..67207.11 rows=1 width=0) (actual time=199297.658..199297.658 rows=1 loops=1)
  Output: count(*)
  Buffers: shared hit=119969133 dirtied=1
  ->  Nested Loop  (cost=59884.61..67207.10 rows=1 width=0) (actual time=486.145..199261.336 rows=120386 loops=1)
        Join Filter: (ps.p_id = p.id)
        Rows Removed by Join Filter: 29809605
        Buffers: shared hit=119969133 dirtied=1
        ->  Merge Join  (cost=59884.19..62745.05 rows=8862 width=13) (actual time=486.052..19265.755 rows=29930082 loops=1)
              Output: ps.p_id, ters.member_id
              Merge Cond: ((ters.begin_date = ps.start_date) AND (cma.id = ps.cma_id) AND ((ters.oe_code)::text = (CASE WHEN (ps.oe_code_id = 1) THEN 'O'::text WHEN (ps.oe_code_id = 2) THEN 'E'::text ELSE 'UNKNOWN_oe_CODE_ID'::text END)))
              Buffers: shared hit=11752
              ->  Sort  (cost=38920.83..39082.15 rows=64528 width=23) (actual time=323.201..384.837 rows=130638 loops=1)
                    Output: hh.id, ters.member_id, ters.begin_date, ters.oe_code, cma.id
                    Sort Key: ters.begin_date, cma.id, ters.oe_code
                    Sort Method: quicksort  Memory: 13279kB
                    Buffers: shared hit=6769
                    ->  Hash Join  (cost=3194.35..33765.80 rows=64528 width=23) (actual time=18.149..194.187 rows=130638 loops=1)
                          Output: hh.id, ters.member_id, ters.begin_date, ters.oe_code, cma.id
                          Hash Cond: ((ters.cma_mmis_id)::text = (cma.mmis_id)::text)
                          Buffers: shared hit=6759
                          ->  Nested Loop  (cost=3190.12..32556.05 rows=64028 width=28) (actual time=18.075..150.186 rows=130108 loops=1)
                                Output: hh.id, ters.member_id, ters.cma_mmis_id, ters.begin_date, ters.oe_code
                                Buffers: shared hit=6754
                                ->  Seq Scan on public.hh  (cost=0.00..1.12 rows=1 width=10) (actual time=0.008..0.011 rows=1 loops=1)
                                      Output: hh.id, hh.name ... [redacted]
                                      Filter: (hh.id = 1)
                                      Rows Removed by Filter: 9
                                      Buffers: shared hit=1
                                ->  Bitmap Heap Scan on public.ters ters  (cost=3190.12..31678.69 rows=87623 width=33) (actual time=18.063..124.542 rows=130108 loops=1)
                                      Output: ters.member_id, ters.hh_mmis_id, ters.cma_mmis_id, ters.begin_date, ters.oe_code
                                      Recheck Cond: ((ters.hh_mmis_id)::text = (hh.mmis_id)::text)
                                      Filter: ((ters.status)::text <> 'Canceled'::text)
                                      Rows Removed by Filter: 49848
                                      Heap Blocks: exact=6060
                                      Buffers: shared hit=6753
                                      ->  Bitmap Index Scan on ters_hh_mmis_id_idx  (cost=0.00..3168.21 rows=138105 width=0) (actual time=16.965..16.965 rows=179956 loops=1)
                                            Index Cond: ((ters.hh_mmis_id)::text = (hh.mmis_id)::text)
                                            Buffers: shared hit=693
                          ->  Hash  (cost=2.99..2.99 rows=99 width=12) (actual time=0.052..0.052 rows=99 loops=1)
                                Output: cma.id, cma.mmis_id
                                Buckets: 1024  Batches: 1  Memory Usage: 5kB
                                Buffers: shared hit=2
                                ->  Seq Scan on public.cma  (cost=0.00..2.99 rows=99 width=12) (actual time=0.006..0.030 rows=99 loops=1)
                                      Output: cma.id, cma.mmis_id
                                      Buffers: shared hit=2
              ->  Sort  (cost=20956.81..21259.78 rows=121187 width=20) (actual time=162.834..3317.995 rows=29930138 loops=1)
                    Output: ps.p_id, ps.hh_id, ps.cma_id, ps.start_date, ps.oe_code_id, (CASE WHEN (ps.oe_code_id = 1) THEN 'O'::text WHEN (ps.oe_code_id = 2) THEN 'E'::text ELSE 'UNKNOWN_oe_CODE_ID'::text END)
                    Sort Key: ps.start_date, ps.cma_id, (CASE WHEN (ps.oe_code_id = 1) THEN 'O'::text WHEN (ps.oe_code_id = 2) THEN 'E'::text ELSE 'UNKNOWN_oe_CODE_ID'::text END)
                    Sort Method: quicksort  Memory: 12708kB
                    Buffers: shared hit=4983
                    ->  Bitmap Heap Scan on public.ps  (cost=2275.62..10724.46 rows=121187 width=20) (actual time=9.940..72.463 rows=123338 loops=1)
                          Output: ps.p_id, ps.hh_id, ps.cma_id, ps.start_date, ps.oe_code_id, CASE WHEN (ps.oe_code_id = 1) THEN 'O'::text WHEN (ps.oe_code_id = 2) THEN 'E'::text ELSE 'UNKNOWN_oe_CODE_ID'::text END
                          Recheck Cond: (ps.hh_id = 1)
                          Heap Blocks: exact=4644
                          Buffers: shared hit=4983
                          ->  Bitmap Index Scan on index_ps_on_hh_id  (cost=0.00..2245.33 rows=121187 width=0) (actual time=9.226..9.226 rows=123338 loops=1)
                                Index Cond: (ps.hh_id = 1)
                                Buffers: shared hit=339
        ->  Index Scan using index_p_on_mm_id on public.p  (cost=0.42..0.49 rows=1 width=12) (actual time=0.005..0.006 rows=1 loops=29930082)
              Output: p.id, p.mm_id
              Index Cond: ((p.mm_id)::text = (ters.member_id)::text)
              Buffers: shared hit=119957381 dirtied=1
Planning time: 5.952 ms
Execution time: 199299.305 ms

1 个答案:

答案 0 :(得分:0)

尝试在CASE子句

中没有ON语句的情况下重构它
SELECT COUNT(*)
FROM ps 
INNER JOIN p ON p.id = ps.patient_id 
INNER JOIN hh ON hh.id = ps.hh_id 
INNER JOIN cma ON cma.id = ps.cma_id 
INNER JOIN ter ters ON ( p.mm_id = ters.member_id ) 
    AND ( hh.mmis_id = ters.hh_mmis_id ) 
    AND ( cma.mmis_id = ters.cma_mmis_id ) 
    AND ( ps.start_date = ters.begin_date ) 
    AND ( (ps.oe_id = 1 AND ters.outreach_enrollment_code = 'O')
        OR (ps.oe_id = 2 AND ters.outreach_enrollment_code = 'E')
        OR (ps.oe_id NOT IN (1,2) AND ters.outreach_enrollment_code = 'UNKNOWN_oe_id'))
WHERE ters.status != 'Canceled' AND hh.id = 1;

如果确保有关于这些表的最新统计信息以及ps.oe_id上​​的索引,它也将有助于提高性能。