PostgreSQL的。低计数(*) - 使用额外的左连接选择性能

时间:2015-01-14 12:03:34

标签: sql postgresql count database-performance

我有这个选择:

SELECT count(*) AS y0_
FROM erc.SUBJECTS this_ 
LEFT OUTER JOIN fias.FIAS_HOUSE factaddres4_ ON this_.FACTADDRESS_REF = factaddres4_.houseId
  LEFT OUTER JOIN fias.FIAS_AGREGATE_ADDRESS factaddres5_ ON factaddres4_.houseId = factaddres5_.HOUSEID
  LEFT OUTER JOIN erc.REFITEMS okopf_1_ ON this_.OKOPF_REF = okopf_1_.ID
WHERE this_.IS_ACTUAL = 1 AND this_.IS_DELETE <> 1 AND NOT okopf_1_.CODE LIKE '5%' AND NOT okopf_1_.CODE = '0'

它运行了将近18秒。

主题表有376k行,fias_house有2100万行,fias_agregate_address - 130。 解释分析结果:

Aggregate  (cost=1061561.33..1061561.34 rows=1 width=4) (actual time=17813.460..17813.460 rows=1 loops=1)
  ->  Hash Left Join  (cost=106687.31..1060683.61 rows=351088 width=4) (actual time=763.556..17741.820 rows=376196 loops=1)
        Hash Cond: ((factaddres4_.houseid)::text = (factaddres5_.houseid)::text)
        ->  Hash Join  (cost=106679.25..1059358.95 rows=351088 width=41) (actual time=760.772..17599.742 rows=376196 loops=1)
              Hash Cond: (this_.okopf_ref = okopf_1_.id)
              ->  Merge Right Join  (cost=106599.85..1053887.84 rows=376166 width=45) (actual time=759.211..17411.313 rows=376254 loops=1)
                    Merge Cond: ((factaddres4_.houseid)::text = (this_.factaddress_ref)::text)
                    ->  Index Only Scan using fias_house_pkey on fias_house factaddres4_  (cost=0.56..924229.05 rows=21084566 width=37) (actual time=0.013..8528.487 rows=19627484 loops=1)
                          Heap Fetches: 0
                    ->  Materialize  (cost=74125.25..76006.08 rows=376166 width=45) (actual time=759.171..980.286 rows=376254 loops=1)
                          ->  Sort  (cost=74125.25..75065.67 rows=376166 width=45) (actual time=759.167..863.495 rows=376254 loops=1)
                                Sort Key: this_.factaddress_ref
                                Sort Method: external sort  Disk: 6616kB
                                ->  Seq Scan on subjects this_  (cost=0.00..27715.88 rows=376166 width=45) (actual time=0.790..591.380 rows=376254 loops=1)
                                      Filter: ((is_delete <> 1) AND (is_actual = 1))
                                      Rows Removed by Filter: 138
              ->  Hash  (cost=53.85..53.85 rows=2044 width=4) (actual time=1.522..1.522 rows=2051 loops=1)
                    Buckets: 1024  Batches: 1  Memory Usage: 49kB
                    ->  Seq Scan on refitems okopf_1_  (cost=0.00..53.85 rows=2044 width=4) (actual time=0.019..0.930 rows=2051 loops=1)
                          Filter: (((code)::text !~~ '5%'::text) AND ((code)::text <> '0'::text))
                          Rows Removed by Filter: 139
        ->  Hash  (cost=6.36..6.36 rows=136 width=37) (actual time=2.761..2.761 rows=136 loops=1)
              Buckets: 1024  Batches: 1  Memory Usage: 8kB
              ->  Seq Scan on fias_agregate_address factaddres5_  (cost=0.00..6.36 rows=136 width=37) (actual time=1.477..2.696 rows=136 loops=1)
Total runtime: 17814.728 ms

无需加入FIAS_AGREGATE_ADDRESS请求即可完成更长时间。解释分析结果:

Aggregate  (cost=34066.40..34066.41 rows=1 width=4) (actual time=510.291..510.292 rows=1 loops=1)
  ->  Hash Join  (cost=79.40..33188.44 rows=351183 width=4) (actual time=1.573..442.526 rows=376196 loops=1)
        Hash Cond: (this_.okopf_ref = okopf_1_.id)
        ->  Seq Scan on subjects this_  (cost=0.00..27715.88 rows=376267 width=45) (actual time=0.144..248.430 rows=376254 loops=1)
              Filter: ((is_delete <> 1) AND (is_actual = 1))
              Rows Removed by Filter: 138
        ->  Hash  (cost=53.85..53.85 rows=2044 width=4) (actual time=1.415..1.415 rows=2051 loops=1)
              Buckets: 1024  Batches: 1  Memory Usage: 49kB
              ->  Seq Scan on refitems okopf_1_  (cost=0.00..53.85 rows=2044 width=4) (actual time=0.007..0.844 rows=2051 loops=1)
                    Filter: (((code)::text !~~ '5%'::text) AND ((code)::text <> '0'::text))
                    Rows Removed by Filter: 139
Total runtime: 510.367 ms

我找到了这篇文章:https://wiki.postgresql.org/wiki/Slow_Counting 但我无法使用这些建议,因为搜索条件可能会有所不同。

我也不能只丢掉FIAS_AGREGATE_ADDRESS加入,因为该表上可能存在搜索条件。

也许会有一些聪明的指数或其他机会,因为疲倦和愚蠢而错过了?

UPD:将work_mem从8MB增加到16后解释分析结果变为:

Aggregate  (cost=1018467.07..1018467.08 rows=1 width=4) (actual time=18615.975..18615.975 rows=1 loops=1)
  ->  Hash Left Join  (cost=810328.24..1017589.11 rows=351183 width=4) (actual time=3.609..18543.596 rows=376196 loops=1)
        Hash Cond: ((factaddres4_.houseid)::text = (factaddres5_.houseid)::text)
        ->  Hash Join  (cost=810320.18..1016264.10 rows=351183 width=41) (actual time=2.190..18400.383 rows=376196 loops=1)
              Hash Cond: (this_.okopf_ref = okopf_1_.id)
              ->  Merge Left Join  (cost=810240.78..1010791.53 rows=376267 width=45) (actual time=0.838..18203.533 rows=376254 loops=1)
                    Merge Cond: ((this_.factaddress_ref)::text = (factaddres4_.houseid)::text)
                    ->  Index Scan using idx_subjects_factaddress_ref_btree on subjects this_  (cost=0.42..32907.70 rows=376267 width=45) (actual time=0.805..701.428 rows=376254 loops=1)
                    ->  Index Only Scan using fias_house_pkey on fias_house factaddres4_  (cost=0.56..924231.15 rows=21084706 width=37) (actual time=0.013..8885.002 rows=19627486 loops=1)
                          Heap Fetches: 0
              ->  Hash  (cost=53.85..53.85 rows=2044 width=4) (actual time=1.307..1.307 rows=2051 loops=1)
                    Buckets: 1024  Batches: 1  Memory Usage: 49kB
                    ->  Seq Scan on refitems okopf_1_  (cost=0.00..53.85 rows=2044 width=4) (actual time=0.010..0.802 rows=2051 loops=1)
                          Filter: (((code)::text !~~ '5%'::text) AND ((code)::text <> '0'::text))
                          Rows Removed by Filter: 139
        ->  Hash  (cost=6.36..6.36 rows=136 width=37) (actual time=1.396..1.396 rows=136 loops=1)
              Buckets: 1024  Batches: 1  Memory Usage: 8kB
              ->  Seq Scan on fias_agregate_address factaddres5_  (cost=0.00..6.36 rows=136 width=37) (actual time=0.782..1.323 rows=136 loops=1)
Total runtime: 18616.060 ms

“排序”行消失了,但请求时间确实没有受到影响。

每次加入都有外键。映射列到处都是私钥。我的意思是,例如,SUBJECTS表有FK:OKOPF_REF-&gt; REFITEMS.ID,ID是REFITEMS中的私钥列。

以下是这些表的ddl(包括索引)的链接:https://yadi.sk/d/-OxGh5BDdy4XW

我发布了修剪查询以获得更好的分析,但是可能存在不同的搜索条件,例如在不同的表中搜索子字符串。我有这种最坏的情况:对于简单的搜索字符串(如'123'),有所有连接(搜索应该在所有表上执行),但仍然计数结果非常大。因此,我不能省略那些左连接。

0 个答案:

没有答案