Question

我有一个表，其GIN索引类似于：

CREATE INDEX my_index ON my_table USING GIN (to_tsvector('english', lower(
    (jsondoc->>'StreetAddress') 
    || ' ' || (jsondoc->>'Neighborhood')
    || ' ' || (jsondoc->>'Area')
    || ' ' || (jsondoc->>'CrossStreet')
    || ' ' || (jsondoc->>'Community')
    || ' ' || (jsondoc->>'Complex') )))
where source = 'x';

我的表有250万行。

假设我要查询单词banana，该单词在该数据集中不是很常见。实际上，它出现了8次。

现在这是奇怪的部分。

此查询需要可重复的 3整秒来执行：

select * from my_table where
(source='x' AND plainto_tsquery('english', 'banana') @@ to_tsvector('english', lower(
    (jsondoc->>'StreetAddress') 
    || ' ' || (jsondoc->>'Neighborhood')
    || ' ' || (jsondoc->>'Area')
    || ' ' || (jsondoc->>'CrossStreet')
    || ' ' || (jsondoc->>'Community')
    || ' ' || (jsondoc->>'Complex') )) AND (jsondoc->>'L_AskingPrice')::real>=250000 AND ((jsondoc->>'L_AskingPrice')::real<=2500000 OR (jsondoc->>'L_asking_price_low')::real<=2500000) AND jsondoc->>
'L_Status' IN ('ACTIVE','BACK ON MARKET') AND iscurrent='true')

这可以在不到 50毫秒的时间内可靠执行：

select * from my_table where
(source='x' AND plainto_tsquery('english', 'banana banana') @@ to_tsvector('english', lower(
    (jsondoc->>'StreetAddress') 
    || ' ' || (jsondoc->>'Neighborhood')
    || ' ' || (jsondoc->>'Area')
    || ' ' || (jsondoc->>'CrossStreet')
    || ' ' || (jsondoc->>'Community')
    || ' ' || (jsondoc->>'Complex') )) AND (jsondoc->>'L_AskingPrice')::real>=250000 AND ((jsondoc->>'L_AskingPrice')::real<=2500000 OR (jsondoc->>'L_asking_price_low')::real<=2500000) AND jsondoc->>
'L_Status' IN ('ACTIVE','BACK ON MARKET') AND iscurrent='true')

您能发现差异吗？

是：在第二个查询中重复单词香蕉。现在，这只是一个例子。通过其他测试，我发现任何一个单词基本上都需要很长时间（〜3秒），而任何双字搜索（包括2个不同的单词）运行速度都很快（〜50ms）。

PostgreSQL版本是最新版本（v11）。

是什么原因造成的？

查询计划

慢（一个字）

快速（一个字重复两次）

全文搜索对单个稀有单词的运行速度较慢，但在重复相同单词的情况下运行速度很快？

0 个答案:

全文搜索对单个稀有单词的运行速度较慢，但​​在重复相同单词的情况下运行速度很快？

0 个答案:

全文搜索对单个稀有单词的运行速度较慢，但在重复相同单词的情况下运行速度很快？