我使用的是Postgres 10.6,我有一张表,其中包含约3000万个地址,我已将它们合并到一列(textsearch
)中:
create table my_addresses as
SELECT
concat_ws(' ', my_text_id, full_address, postcode, business_name) as address,
to_tsvector('english', concat_ws(' ', my_text_id, full_address, postcode, business_name)) as textsearch
from my_data;
在此之上,我创建了一个索引
CREATE INDEX ON my_addresses USING GIN (textsearch);
我想要实现的是免费文本搜索(类似于Google),我可以在其中输入任何文本并选择最相似地址的前10位。 所以我要查询:
SELECT address, similarity('flat 11 peabody se17 1bt', address) AS simil
FROM my_addresses
select * from x
ORDER BY simil DESC
LIMIT 10;
作为结果,查询看起来还不错,但它非常慢,而且甚至没有使用索引。
因此,我这次以address
字段为基础,以这种方式创建了替代索引:
CREATE INDEX ON my_addresses USING GIN (address gin_trgm_ops);
并按以下方式更改查询:
SELECT address, similarity('flat 11 peabody se17 1bt', address) AS simil
FROM my_addresses
where addresses % 'flat 11 peabody se17 1bt'
ORDER BY simil DESC
LIMIT 10;
这次使用索引,但是它仍然很慢(大约1分钟)。
所以我尝试了一个新索引:
CREATE INDEX ON my_addresses USING GIST (address gist_trgm_ops);
并再次更改查询,如下所示:
SELECT address, address <-> 'flat 11 peabody se17 1bt' AS dist
FROM my_addresses
ORDER BY dist LIMIT 10;
但是它仍然很慢。
有没有一种方法可以使这种查询更快?