向postgres添加更多索引会导致“out of shared memory”错误

时间:2014-01-13 23:46:19

标签: postgresql indexing shared-memory database-partitioning

我有一个相当复杂的查询,我正在尝试在postgres 9.2中进行优化 - 解释分析给出了this plan (explain.depesz.com)

        Merge Right Join  (cost=194965639.35..211592151.26 rows=420423258 width=616) (actual time=15898.283..15920.603 rows=17 loops=1)
       Merge Cond: ((((p.context -> 'device'::text)) = ((s.context -> 'device'::text))) AND (((p.context -> 'physical_port'::text)) = ((s.context -> 'physical_port'::text))))
       ->  Sort  (cost=68925.49..69073.41 rows=59168 width=393) (actual time=872.289..877.818 rows=39898 loops=1)
             Sort Key: ((p.context -> 'device'::text)), ((p.context -> 'physical_port'::text))
             Sort Method: quicksort  Memory: 27372kB
             ->  Seq Scan on ports__status p  (cost=0.00..64235.68 rows=59168 width=393) (actual time=0.018..60.931 rows=41395 loops=1)
       ->  Materialize  (cost=194896713.86..199620346.93 rows=284223403 width=299) (actual time=15023.710..15024.779 rows=17 loops=1)
             ->  Merge Left Join  (cost=194896713.86..198909788.42 rows=284223403 width=299) (actual time=15023.705..15024.765 rows=17 loops=1)
                   Merge Cond: ((((s.context -> 'device'::text)) = ((l1.context -> 'device'::text))) AND (((s.context -> 'physical_port'::text)) = ((l1.context -> 'physical_port'::text))))
                   ->  Sort  (cost=194894861.42..195605419.92 rows=284223403 width=224) (actual time=14997.225..14997.230 rows=17 loops=1)
                         Sort Key: ((s.context -> 'device'::text)), ((s.context -> 'physical_port'::text))
                         Sort Method: quicksort  Memory: 33kB
                         ->  GroupAggregate  (cost=100001395.98..122028709.71 rows=284223403 width=389) (actual time=14997.120..14997.186 rows=17 loops=1)
                               ->  Sort  (cost=100001395.98..100711954.49 rows=284223403 width=389) (actual time=14997.080..14997.080 rows=17 loops=1)
                                     Sort Key: ((d.context -> 'hostname'::text)), ((a.context -> 'ip_address'::text)), ((a.context -> 'mac_address'::text)), ((s.context -> 'device'::text)), ((s.context -> 'physical_port'::text)), s.created_at, s.updated_at, d.created_at, d.updated_at
                                     Sort Method: quicksort  Memory: 33kB
                                     ->  Merge Join  (cost=339026.99..9576678.30 rows=284223403 width=389) (actual time=14996.710..14996.749 rows=17 loops=1)
                                           Merge Cond: (((a.context -> 'mac_address'::text)) = ((s.context -> 'mac_address'::text)))
                                           ->  Sort  (cost=15038.32..15136.00 rows=39072 width=255) (actual time=23.556..23.557 rows=1 loops=1)
                                                 Sort Key: ((a.context -> 'mac_address'::text))
                                                 Sort Method: quicksort  Memory: 25kB
                                                 ->  Hash Join  (cost=471.88..12058.33 rows=39072 width=255) (actual time=13.482..23.548 rows=1 loops=1)
                                                       Hash Cond: ((a.context -> 'ip_address'::text) = (d.context -> 'ip_address'::text))
                                                       ->  Seq Scan on arps__arps a  (cost=0.00..8132.39 rows=46239 width=157) (actual time=0.007..11.191 rows=46259 loops=1)
                                                       ->  Hash  (cost=469.77..469.77 rows=169 width=98) (actual time=0.035..0.035 rows=1 loops=1)
                                                             Buckets: 1024  Batches: 1  Memory Usage: 1kB
                                                             ->  Bitmap Heap Scan on ipam__dns d  (cost=9.57..469.77 rows=169 width=98) (actual time=0.023..0.023 rows=1 loops=1)
                                                                   Recheck Cond: ((context -> 'hostname'::text) = 'zglast-oracle03.slac.stanford.edu'::text)
                                                                   ->  Bitmap Index Scan on ipam__dns_hostname_index  (cost=0.00..9.53 rows=169 width=0) (actual time=0.017..0.017 rows=1 loops=1)
                                                                         Index Cond: ((context -> 'hostname'::text) = 'blah'::text)
                                           ->  Sort  (cost=323988.67..327625.84 rows=1454870 width=134) (actual time=14973.118..14973.120 rows=18 loops=1)
                                                 Sort Key: ((s.context -> 'mac_address'::text))
                                                 Sort Method: external sort  Disk: 214176kB
                                                 ->  Result  (cost=0.00..175064.84 rows=1454870 width=134) (actual time=0.016..1107.604 rows=1265154 loops=1)
                                                       ->  Append  (cost=0.00..175064.84 rows=1454870 width=134) (actual time=0.013..796.578 rows=1265154 loops=1)
                                                             ->  Seq Scan on spanning_tree__neighbour s  (cost=0.00..0.00 rows=1 width=98) (actual time=0.000..0.000 rows=0 loops=1)
                                                                   Filter: ((context -> 'physical_port'::text) IS NOT NULL)
                                                             ->  Seq Scan on spanning_tree__neighbour__vlan38 s  (cost=0.00..469.32 rows=1220 width=129) (actual time=0.011..1.019 rows=823 loops=1)
                                                                   Filter: ((context -> 'physical_port'::text) IS NOT NULL)
                                                                   Rows Removed by Filter: 403
                                                             ->  Seq Scan on spanning_tree__neighbour__vlan3 s  (cost=0.00..270.20 rows=1926 width=139) (actual time=0.017..0.971 rows=1882 loops=1)
                                                                   Filter: ((context -> 'physical_port'::text) IS NOT NULL)
                                                                   Rows Removed by Filter: 54
                                                             ->  Seq Scan on spanning_tree__neighbour__vlan466 s  (cost=0.00..131.85 rows=306 width=141) (actual time=0.032..0.340 rows=276 loops=1)
                                                                   Filter: ((context -> 'physical_port'::text) IS NOT NULL)
                                                                   Rows Removed by Filter: 32
                                                             ->  Seq Scan on spanning_tree__neighbour__vlan465 s  (cost=0.00..208.57 rows=842 width=142) (actual time=0.005..0.622 rows=768 loops=1)
                                                                   Filter: ((context -> 'physical_port'::text) IS NOT NULL)
                                                                   Rows Removed by Filter: 78
                                                             ->  Seq Scan on spanning_tree__neighbour__vlan499 s  (cost=0.00..245.04 rows=481 width=142) (actual time=0.017..0.445 rows=483 loops=1)
                                                                   Filter: ((context -> 'physical_port'::text) IS NOT NULL)
                                                             ->  Seq Scan on spanning_tree__neighbour__vlan176 s  (cost=0.00..346.36 rows=2576 width=131) (actual time=0.008..1.443 rows=2051 loops=1)
                                                                   Filter: ((context -> 'physical_port'::text) IS NOT NULL)
                                                                   Rows Removed by Filter: 538

我在阅读这个计划时有点新手,但我认为这完全取决于我有表spanning_tree__neighbour(我已将其划分为多个'vlan'表)。你可以看到它正在执行seq扫描。

所以我写了一个快速而脏的bash脚本来为子表创建索引:

create index spanning_tree__neighbour__vlan1_physical_port_index ON spanning_tree__neighbour__vlan1((context->'physical_port')) wHERE ((context->'physical_port') IS NOT NULL);
create index spanning_tree__neighbour__vlan2_physical_port_index ON spanning_tree__neighbour__vlan2((context->'physical_port')) wHERE ((context->'physical_port') IS NOT NULL);
create index spanning_tree__neighbour__vlan3_physical_port_index ON spanning_tree__neighbour__vlan3((context->'physical_port')) wHERE ((context->'physical_port') IS NOT NULL);
...

但在我创建了大约一百个之后,任何查询都会给出:

=> explain analyze select * from hosts where hostname='blah';
WARNING:  out of shared memory
ERROR:  out of shared memory
HINT:  You might need to increase max_locks_per_transaction.
Time: 34.757 ms

设置max_locks_per_transaction实际上有帮助吗?鉴于我的分区表有多达4096个子表,我应该使用什么值?

或者我读错了计划?

1 个答案:

答案 0 :(得分:1)

  

设置max_locks_per_transaction实际上有帮助吗?

不,它不会。

无论如何都不是先修复架构和查询。

突然出现了一些问题......有些已在评论中提到过。没有特别的顺序:

  1. 统计已关闭。 ANALYZE您的表格,如果您确定autovacuum没有足够的内存来正常工作,请增加maintenance_work_mem

  2. Sort Method: external sort Disk: 214176kB之类的步骤表明您正在对磁盘上的行进行排序。相应地增加work_mem

  3. Seq Scan on spanning_tree__neighbour__vlan176 s (cost=0.00..346.36 rows=2576 width=131) (actual time=0.008..1.443 rows=2051 loops=1)后跟append的步骤充其量是可疑的。

    查看...当您想要将无法管理或不切实际的内容转变为更易于管理的内容时,请使用分区表,例如推动几十亿行旧数据,而不是每天使用的数百万个数据。不要将几百万行变成4,096个微不足道的表格,其中平均只有一小块1k行。

  4. 下一个罪犯就像Filter: ((context -> 'physical_port'::text) IS NOT NULL) - ARGH。

    从不, 将内容存储在hstore,JSON,XML或任何其他类型的EAV(实体属性值存储)中,如果您关心的数据是落在其中;特别是如果它出现在where,join或sort(!)子句中。没有ifs,没有但只是改变你的架构。

    另外,查询中出现的一堆字段可以使用Postgres的网络类型而不是哑文本方便地存储。可能性也应该被编入索引。 (如果他们不这样做,他们就不会出现在计划中。)

  5. 您可以在左连接下执行GroupAggregate。通常,这表示查询如:… left join (select agg_fn(…) … group by …) foo …。根据我的经验,这是一个很大的不。如果可以,请将其从您的查询中拉出来。

    该计划太长且难以理解为什么它正在这样做,但如果select * from hosts where hostname='blah';是任何事情,那么您似乎绝对选择了一个查询中可以访问的所有可能的东西。

    要找到您真正想要的几个行,然后运行一些其他查询来选择相关数据,要便宜得多,速度更快。所以这样做。

    如果仍然因某种原因需要加入该聚合子查询,请务必查看window functions。通常情况下,它们允许您直接在当前行集上运行聚合,从而免除了对血腥连接的需求。

  6. 完成这些步骤后,默认max_locks_per_transaction就可以了。