Postgres常用词的全文搜索性能

时间:2015-06-22 17:54:57

标签: sql postgresql search full-text-search

当有人试图查询非常常见的内容时,我正在努力提高搜索性能。我有一个包含530万条记录及其邮寄地址的数据库,其中很大一部分的常用词是" road"," rd"," st&#34等等......所以当有人搜索时,需要很长时间。

如下所示,我尝试搜索不常见的内容(箭头):

pulsar_dev=# EXPLAIN ANALYZE SELECT
                property->>'rollNumber',
                property->>'municipalAddress',
                property->>'municipalityDescription'
FROM
                properties_cmv
WHERE
                to_tsvector('simple', property->>'municipalAddress') ||
                to_tsvector('simple', property->>'municipalityDescription') ||
                to_tsvector('simple', property->>'countyDescription') @@ plainto_tsquery('arrowhead')
ORDER BY ts_rank(to_tsvector('simple', property->>'municipalAddress') ||
                to_tsvector('simple', property->>'municipalityDescription') ||
                to_tsvector('simple', property->>'countyDescription'), plainto_tsquery('arrowhead')) DESC;
                                                                                                                                                    QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=4420.99..4424.11 rows=1248 width=23) (actual time=136.957..137.047 rows=490 loops=1)
   Sort Key: (ts_rank(((to_tsvector('simple'::regconfig, (property ->> 'municipalAddress'::text)) || to_tsvector('simple'::regconfig, (property ->> 'municipalityDesc
ription'::text))) || to_tsvector('simple'::regconfig, (property ->> 'countyDescription'::text))), plainto_tsquery('arrowhead'::text)))
   Sort Method: quicksort  Memory: 93kB
   ->  Bitmap Heap Scan on properties_cmv  (cost=25.69..4356.81 rows=1248 width=23) (actual time=0.350..136.566 rows=490 loops=1)
         Recheck Cond: (((to_tsvector('simple'::regconfig, (property ->> 'municipalAddress'::text)) || to_tsvector('simple'::regconfig, (property ->> 'municipalityDe
scription'::text))) || to_tsvector('simple'::regconfig, (property ->> 'countyDescription'::text))) @@ plainto_tsquery('arrowhead'::text))
         Heap Blocks: exact=39
         ->  Bitmap Index Scan on prop_address_idx  (cost=0.00..25.38 rows=1248 width=0) (actual time=0.072..0.072 rows=490 loops=1)
               Index Cond: (((to_tsvector('simple'::regconfig, (property ->> 'municipalAddress'::text)) || to_tsvector('simple'::regconfig, (property ->> 'municipali
tyDescription'::text))) || to_tsvector('simple'::regconfig, (property ->> 'countyDescription'::text))) @@ plainto_tsquery('arrowhead'::text))
 Planning time: 0.213 ms
 Execution time: 137.184 ms
(10 rows)

它非常快,但是当我搜索" road"时,它并不快:

pulsar_dev=# EXPLAIN ANALYZE SELECT
                property->>'rollNumber',
                property->>'municipalAddress',
                property->>'municipalityDescription'
FROM
                properties_cmv
WHERE
                to_tsvector('simple', property->>'municipalAddress') ||
                to_tsvector('simple', property->>'municipalityDescription') ||
                to_tsvector('simple', property->>'countyDescription') @@ plainto_tsquery('road')
ORDER BY ts_rank(to_tsvector('simple', property->>'municipalAddress') ||
                to_tsvector('simple', property->>'municipalityDescription') ||
                to_tsvector('simple', property->>'countyDescription'), plainto_tsquery('road')) DESC;
                                                                                                                                                  QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=25533.10..25560.73 rows=11051 width=23) (actual time=11065.051..11066.883 rows=10356 loops=1)
   Sort Key: (ts_rank(((to_tsvector('simple'::regconfig, (property ->> 'municipalAddress'::text)) || to_tsvector('simple'::regconfig, (property ->> 'municipalityDesc
ription'::text))) || to_tsvector('simple'::regconfig, (property ->> 'countyDescription'::text))), plainto_tsquery('road'::text)))
   Sort Method: quicksort  Memory: 1841kB
   ->  Bitmap Heap Scan on properties_cmv  (cost=117.67..24790.93 rows=11051 width=23) (actual time=1.911..11052.683 rows=10356 loops=1)
         Recheck Cond: (((to_tsvector('simple'::regconfig, (property ->> 'municipalAddress'::text)) || to_tsvector('simple'::regconfig, (property ->> 'municipalityDe
scription'::text))) || to_tsvector('simple'::regconfig, (property ->> 'countyDescription'::text))) @@ plainto_tsquery('road'::text))
         Heap Blocks: exact=1408
         ->  Bitmap Index Scan on prop_address_idx  (cost=0.00..114.91 rows=11051 width=0) (actual time=1.432..1.432 rows=10356 loops=1)
               Index Cond: (((to_tsvector('simple'::regconfig, (property ->> 'municipalAddress'::text)) || to_tsvector('simple'::regconfig, (property ->> 'municipali
tyDescription'::text))) || to_tsvector('simple'::regconfig, (property ->> 'countyDescription'::text))) @@ plainto_tsquery('road'::text))
 Planning time: 0.210 ms
 Execution time: 11069.142 ms
(10 rows)

如何改善第二个查询的性能?我还需要对结果进行排名,首先返回最相关的结果。

在elasticsearch上运行类似的测试,以毫秒为单位。

1 个答案:

答案 0 :(得分:3)

我创建了一个新表并将连接的tsvector保存在一个列和索引中,它似乎提高了速度。