所以基本上我设置了一个非常简单的测试表来测试postgresql 9.1(库存Debian稳定版)中的trigram和全文索引功能。
以下是表和索引定义:
-- Table: fulltextproba
-- DROP TABLE fulltextproba;
CREATE TABLE fulltextproba
(
id integer NOT NULL,
text text,
varchar600 character varying(600) COLLATE pg_catalog."C.UTF-8",
CONSTRAINT id PRIMARY KEY (id )
)
WITH (
OIDS=FALSE
);
-- Index: id_index
-- DROP INDEX id_index;
CREATE UNIQUE INDEX id_index
ON fulltextproba
USING btree
(id );
-- Index: text_gin_fulltext_hun
-- DROP INDEX text_gin_fulltext_hun;
CREATE INDEX text_gin_fulltext_hun
ON fulltextproba
USING gin
(to_tsvector('hungarian'::text, text) );
-- Index: text_gin_trgm
-- DROP INDEX text_gin_trgm;
CREATE INDEX text_gin_trgm
ON fulltextproba
USING gin
(text COLLATE pg_catalog."C.UTF-8" gin_trgm_ops);
-- Index: varchar600
-- DROP INDEX varchar600;
CREATE INDEX varchar600
ON fulltextproba
USING btree
(varchar600 COLLATE pg_catalog."C.UTF-8" varchar_pattern_ops);
-- Index: varchar600_gin_trgm
-- DROP INDEX varchar600_gin_trgm;
CREATE INDEX varchar600_gin_trgm
ON fulltextproba
USING gin
(varchar600 COLLATE pg_catalog."C.UTF-8" gin_trgm_ops);
我的问题是,如果我进行应使用三元组索引的%foo%
搜索,如果我搜索文本列,则不会:
SELECT COUNT(id) FROM public.fulltextproba WHERE text LIKE '%almáv%'
count
-------
396
(1 row)
real 0m7.215s
user 0m0.020s
sys 0m0.004s
QUERY PLAN
---------------------------------------------------------------------------
Aggregate (cost=657056.11..657056.12 rows=1 width=4)
-> Seq Scan on fulltextproba (cost=0.00..657052.72 rows=1355 width=4)
Filter: (text ~~ '%almáv%'::text)
(3 rows)
但是如果我在varchar600列中搜索,它确实使用了trigram索引,并且 - 并不令人惊讶 - 更快:
SELECT COUNT(id) FROM public.fulltextproba WHERE varchar600 LIKE '%almáv%'
count
-------
373
(1 row)
real 0m0.184s
user 0m0.052s
sys 0m0.004s
QUERY PLAN
--------------------------------------------------------------------------------------------
Aggregate (cost=5283.11..5283.12 rows=1 width=4)
-> Bitmap Heap Scan on fulltextproba (cost=62.50..5279.73 rows=1355 width=4)
Recheck Cond: ((varchar600)::text ~~ '%almáv%'::text)
-> Bitmap Index Scan on varchar600_gin_trgm (cost=0.00..62.16 rows=1355 width=0)
Index Cond: ((varchar600)::text ~~ '%almáv%'::text)
(5 rows)
所以最终的问题是:
答案 0 :(得分:3)
text
完全没问题。您可以在EXPLAIN
输出中看到最佳选项:
Index Cond: ((varchar600)::text ~~ '%almáv%'::text)
直接原因可能是整理不匹配。您的表已定义:
text text, -- default collation is ???
varchar600 character varying(600) COLLATE pg_catalog."C.UTF-8"
虽然两个索引都使用COLLATE pg_catalog."C.UTF-8"
。您的默认排序规则是什么?输出来自:
SHOW LC_COLLATE;
您可能会混合不同的排序规则。重新测试:
SELECT COUNT(id) FROM public.fulltextproba
WHERE text COLLATE pg_catalog."C.UTF-8" LIKE '%almáv%'
Read about collation support in Postgres.
您显然在任一列中都有不同的值。使用相同的值重复测试。
要强制Postgres使用索引,您可以(仅在会话中进行调试!):
SET enable_seqscan = off;
然后再试一次。详细说明:
即将发布的Postgres 9.4随GIN索引进行了大量改进。特别是,它们会更小更快。