查询计划程序在“count”或“order by”

时间:2016-09-17 10:17:52

标签: postgresql

以下查询看起来有点复杂,但实际上只是使用GIST索引上的过滤器选择XML列,并通过另一个也被索引的“XML字段”进行排序。

Explain(analyze, buffers) Select
                content::text ,
                count(*) over()
            from B2HEAD.item
            where cluster = 'B2BOX' and 
            (  
                 ( concept=$b2$Test$b2$ )  and  
                 ( to_tsvector('english', array_to_string_i(xpath('/Test/BusinessDocNumber//text()', content)::text[], ' ')) @@ to_tsquery('english', 'MNJ4989') )  
            ) 
            order by  (array_to_string_i(xpath('/Test/CreationDate//text()', content)::text[], ' ')::varchar(256)) desc 
            offset 0 limit 50;

查询速度很慢,只使用有序字段上的索引(item_mi_idx01

 Limit  (cost=0.43..6721.92 rows=50 width=997) (actual time=70038.645..70038.647 rows=1 loops=1)
   Buffers: shared hit=418875 read=435292
   ->  WindowAgg  (cost=0.43..784801.69 rows=5838 width=997) (actual time=70038.642..70038.642 rows=1 loops=1)
         Buffers: shared hit=418875 read=435292
         ->  Index Scan Backward using item_mi_idx01 on item  (cost=0.43..783240.03 rows=5838 width=997) (actual time=1.610..70038.487 rows=1 loops=1)
               Filter: (to_tsvector('english'::regconfig, array_to_string_i((xpath('/Test/BusinessDocNumber//text()'::text, content, '{}'::text[]))::text[], ' '::text)) @@ '''mnj4989'''::tsquery)
               Rows Removed by Filter: 1389827
               Buffers: shared hit=418875 read=435292
 Planning time: 8.939 ms
 Execution time: 70038.803 ms
(10 rows)

删除Order By子句会导致使用GIST索引(item_mi_fts_idx04)并且查询速度非常快

Limit  (cost=0.29..51.70 rows=50 width=997) (actual time=13.971..13.974 rows=1 loops=1)
  Buffers: shared read=1011
  ->  WindowAgg  (cost=0.29..6003.62 rows=5838 width=997) (actual time=13.969..13.970 rows=1 loops=1)
        Buffers: shared read=1011
        ->  Index Scan using item_mi_fts_idx04 on item  (cost=0.29..5930.64 rows=5838 width=997) (actual time=6.838..13.958 rows=1 loops=1)
              Index Cond: (to_tsvector('english'::regconfig, array_to_string_i((xpath('/Test/BusinessDocNumber//text()'::text, content, '{}'::text[]))::text[], ' '::text)) @@ '''mnj4989'''::tsquery)
              Buffers: shared read=1011
Planning time: 9.120 ms
Execution time: 14.044 ms
(9 rows)

如果保留Order By并删除count(*) over()

中的select,则相同
Limit  (cost=7613.27..7613.39 rows=50 width=997) (actual time=12.362..12.362 rows=1 loops=1)
  Buffers: shared hit=1014
  ->  Sort  (cost=7613.27..7627.86 rows=5838 width=997) (actual time=12.359..12.359 rows=1 loops=1)
        Sort Key: ((array_to_string_i((xpath('/Test/CreationDate//text()'::text, content, '{}'::text[]))::text[], ' '::text))::character varying(256)) DESC
        Sort Method: quicksort  Memory: 27kB
        Buffers: shared hit=1014
        ->  Index Scan using item_mi_fts_idx04 on item  (cost=0.29..7419.33 rows=5838 width=997) (actual time=5.863..12.333 rows=1 loops=1)
              Index Cond: (to_tsvector('english'::regconfig, array_to_string_i((xpath('/Test/BusinessDocNumber//text()'::text, content, '{}'::text[]))::text[], ' '::text)) @@ '''mnj4989'''::tsquery)
              Buffers: shared hit=1011
Planning time: 8.201 ms
Execution time: 12.417 ms
(11 rows)

这些查询是自动生成的,我们需要同时保留countorder by

可以做些什么来告诉PostgreSQL使用GIST索引?

(已在桌面上进行真空分析)

编辑(关注@a_horse_with_no_name评论)

重写的查询现在看起来像

Explain(analyze,buffers)
    With data as (
     Select
                    content::text ,
                    (array_to_string_i(xpath('/Test/CreationDate//text()', content)::text[], ' ')::varchar(256)) as orderField,
                    count(*) over()
                from B2HEAD.item
                where cluster = 'B2BOX'
                      and concept=$b2$Test$b2$
                      and to_tsvector('english', array_to_string_i(xpath('/Test/BusinessDocNumber//text()', content)::text[], ' ')) @@ to_tsquery('english', 'MJ')
    )
    select * from data
    order by  data.orderField desc
    offset 0 limit 50;

这使用了GIST索引(item_mi_fts_idx04)并且速度更快

  Limit  (cost=7803.00..7803.13 rows=50 width=556) (actual time=12.928..12.928 rows=0 loops=1)
  Buffers: shared hit=3 read=963
  CTE data
    ->  WindowAgg  (cost=0.29..7492.31 rows=5838 width=997) (actual time=12.908..12.908 rows=0 loops=1)
          Buffers: shared read=963
          ->  Index Scan using item_mi_fts_idx04 on item  (cost=0.29..5930.64 rows=5838 width=997) (actual time=12.903..12.903 rows=0 loops=1)
                Index Cond: (to_tsvector('english'::regconfig, array_to_string_i((xpath('/Test/BusinessDocNumber//text()'::text, content, '{}'::text[]))::text[], ' '::text)) @@ '''mj'''::tsquery)
                Buffers: shared read=963
  ->  Sort  (cost=310.69..325.29 rows=5838 width=556) (actual time=12.926..12.926 rows=0 loops=1)
        Sort Key: data.orderfield DESC
        Sort Method: quicksort  Memory: 25kB
        Buffers: shared hit=3 read=963
        ->  CTE Scan on data  (cost=0.00..116.76 rows=5838 width=556) (actual time=12.909..12.909 rows=0 loops=1)
              Buffers: shared read=963
Planning time: 9.335 ms
Execution time: 13.015 ms
(16 rows)

然而,排序现在是一种快速排序,快速排序超过6000行,但随着行数的增加,性能下降得非常快(O(n log n))(对于1M行,查询将接管一个分钟)。

使用CTE,我们找不到一种方法让PostgreSQL使用BTREE索引item_mi_idx01来索引data.OrderField指向的数据

进一步的帮助将不胜感激。

这是另一次尝试,即使删除纯文本搜索的条件,也会使用快速排序...

Explain(analyze,buffers)
With data as (
    Select
        content::text as xml
    from B2HEAD.item
    where cluster = 'B2BOX'
          and concept=$b2$Test$b2$
          and to_tsvector('english', array_to_string_i(xpath('/Test/BusinessDocNumber//text()', content)::text[], ' ')) @@ to_tsquery('english', 'Aggregated')
    order by (array_to_string_i(xpath('/Test/CreationDate//text()', content)::text[], ' ')::varchar(256)) desc
)
select xml, count(*) over() from data
offset 0 limit 50;    

编辑2

count(*) over()似乎是所有问题的根源。在上述任何查询中删除它将导致快速查询。

致力于estimates;将保持此页面更新。

我们放弃了!

更改上面第一个查询的限制,更改行为! 即使查询实际返回单个值,增加限制也会使用GIST索引进行查询。

查询规划器中有太多的移动部件试图对它周围的任何东西进行编程;我们只是试图欺骗它..

0 个答案:

没有答案