以下查询看起来有点复杂,但实际上只是使用GIST索引上的过滤器选择XML列,并通过另一个也被索引的“XML字段”进行排序。
Explain(analyze, buffers) Select
content::text ,
count(*) over()
from B2HEAD.item
where cluster = 'B2BOX' and
(
( concept=$b2$Test$b2$ ) and
( to_tsvector('english', array_to_string_i(xpath('/Test/BusinessDocNumber//text()', content)::text[], ' ')) @@ to_tsquery('english', 'MNJ4989') )
)
order by (array_to_string_i(xpath('/Test/CreationDate//text()', content)::text[], ' ')::varchar(256)) desc
offset 0 limit 50;
查询速度很慢,只使用有序字段上的索引(item_mi_idx01
)
Limit (cost=0.43..6721.92 rows=50 width=997) (actual time=70038.645..70038.647 rows=1 loops=1)
Buffers: shared hit=418875 read=435292
-> WindowAgg (cost=0.43..784801.69 rows=5838 width=997) (actual time=70038.642..70038.642 rows=1 loops=1)
Buffers: shared hit=418875 read=435292
-> Index Scan Backward using item_mi_idx01 on item (cost=0.43..783240.03 rows=5838 width=997) (actual time=1.610..70038.487 rows=1 loops=1)
Filter: (to_tsvector('english'::regconfig, array_to_string_i((xpath('/Test/BusinessDocNumber//text()'::text, content, '{}'::text[]))::text[], ' '::text)) @@ '''mnj4989'''::tsquery)
Rows Removed by Filter: 1389827
Buffers: shared hit=418875 read=435292
Planning time: 8.939 ms
Execution time: 70038.803 ms
(10 rows)
删除Order By
子句会导致使用GIST索引(item_mi_fts_idx04
)并且查询速度非常快
Limit (cost=0.29..51.70 rows=50 width=997) (actual time=13.971..13.974 rows=1 loops=1)
Buffers: shared read=1011
-> WindowAgg (cost=0.29..6003.62 rows=5838 width=997) (actual time=13.969..13.970 rows=1 loops=1)
Buffers: shared read=1011
-> Index Scan using item_mi_fts_idx04 on item (cost=0.29..5930.64 rows=5838 width=997) (actual time=6.838..13.958 rows=1 loops=1)
Index Cond: (to_tsvector('english'::regconfig, array_to_string_i((xpath('/Test/BusinessDocNumber//text()'::text, content, '{}'::text[]))::text[], ' '::text)) @@ '''mnj4989'''::tsquery)
Buffers: shared read=1011
Planning time: 9.120 ms
Execution time: 14.044 ms
(9 rows)
如果保留Order By
并删除count(*) over()
select
,则相同
Limit (cost=7613.27..7613.39 rows=50 width=997) (actual time=12.362..12.362 rows=1 loops=1)
Buffers: shared hit=1014
-> Sort (cost=7613.27..7627.86 rows=5838 width=997) (actual time=12.359..12.359 rows=1 loops=1)
Sort Key: ((array_to_string_i((xpath('/Test/CreationDate//text()'::text, content, '{}'::text[]))::text[], ' '::text))::character varying(256)) DESC
Sort Method: quicksort Memory: 27kB
Buffers: shared hit=1014
-> Index Scan using item_mi_fts_idx04 on item (cost=0.29..7419.33 rows=5838 width=997) (actual time=5.863..12.333 rows=1 loops=1)
Index Cond: (to_tsvector('english'::regconfig, array_to_string_i((xpath('/Test/BusinessDocNumber//text()'::text, content, '{}'::text[]))::text[], ' '::text)) @@ '''mnj4989'''::tsquery)
Buffers: shared hit=1011
Planning time: 8.201 ms
Execution time: 12.417 ms
(11 rows)
这些查询是自动生成的,我们需要同时保留count
和order by
。
可以做些什么来告诉PostgreSQL使用GIST索引?
(已在桌面上进行真空分析)
编辑(关注@a_horse_with_no_name评论)
重写的查询现在看起来像
Explain(analyze,buffers)
With data as (
Select
content::text ,
(array_to_string_i(xpath('/Test/CreationDate//text()', content)::text[], ' ')::varchar(256)) as orderField,
count(*) over()
from B2HEAD.item
where cluster = 'B2BOX'
and concept=$b2$Test$b2$
and to_tsvector('english', array_to_string_i(xpath('/Test/BusinessDocNumber//text()', content)::text[], ' ')) @@ to_tsquery('english', 'MJ')
)
select * from data
order by data.orderField desc
offset 0 limit 50;
这使用了GIST索引(item_mi_fts_idx04
)并且速度更快
Limit (cost=7803.00..7803.13 rows=50 width=556) (actual time=12.928..12.928 rows=0 loops=1)
Buffers: shared hit=3 read=963
CTE data
-> WindowAgg (cost=0.29..7492.31 rows=5838 width=997) (actual time=12.908..12.908 rows=0 loops=1)
Buffers: shared read=963
-> Index Scan using item_mi_fts_idx04 on item (cost=0.29..5930.64 rows=5838 width=997) (actual time=12.903..12.903 rows=0 loops=1)
Index Cond: (to_tsvector('english'::regconfig, array_to_string_i((xpath('/Test/BusinessDocNumber//text()'::text, content, '{}'::text[]))::text[], ' '::text)) @@ '''mj'''::tsquery)
Buffers: shared read=963
-> Sort (cost=310.69..325.29 rows=5838 width=556) (actual time=12.926..12.926 rows=0 loops=1)
Sort Key: data.orderfield DESC
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=3 read=963
-> CTE Scan on data (cost=0.00..116.76 rows=5838 width=556) (actual time=12.909..12.909 rows=0 loops=1)
Buffers: shared read=963
Planning time: 9.335 ms
Execution time: 13.015 ms
(16 rows)
然而,排序现在是一种快速排序,快速排序超过6000行,但随着行数的增加,性能下降得非常快(O(n log n))(对于1M行,查询将接管一个分钟)。
使用CTE,我们找不到一种方法让PostgreSQL使用BTREE索引item_mi_idx01
来索引data.OrderField
指向的数据
进一步的帮助将不胜感激。
这是另一次尝试,即使删除纯文本搜索的条件,也会使用快速排序...
Explain(analyze,buffers)
With data as (
Select
content::text as xml
from B2HEAD.item
where cluster = 'B2BOX'
and concept=$b2$Test$b2$
and to_tsvector('english', array_to_string_i(xpath('/Test/BusinessDocNumber//text()', content)::text[], ' ')) @@ to_tsquery('english', 'Aggregated')
order by (array_to_string_i(xpath('/Test/CreationDate//text()', content)::text[], ' ')::varchar(256)) desc
)
select xml, count(*) over() from data
offset 0 limit 50;
编辑2
count(*) over()
似乎是所有问题的根源。在上述任何查询中删除它将导致快速查询。
致力于estimates;将保持此页面更新。
我们放弃了!
更改上面第一个查询的限制,更改行为! 即使查询实际返回单个值,增加限制也会使用GIST索引进行查询。
查询规划器中有太多的移动部件试图对它周围的任何东西进行编程;我们只是试图欺骗它..