Question

我使用一个表进行了简单的设置，在该表中，我导入了一些具有最多255个字符的简短说明（bio）的用户。在说明字段旁边，我还有另一个tsvector类型的字段，它是生物的但已标记化的

。

我的目标是找到包含某些关键字的用户简历。在单个表上进行全文搜索，因此我不确定是否甚至需要tsvector字段（我是postgres的新手，过去只使用过Mysql），因为在我看来，它真的很强大在不同的字段/表中查询。

我的实际问题是，当不涉及任何排序（0.5s）时，该查询运行正常，但是当在索引字段上添加单个order by子句时（8s），此查询将非常慢。我那里只有大约一百万条记录。

表格设置：

CREATE TABLE public.django_user
(
    id integer NOT NULL DEFAULT nextval('django_user_id_seq'::regclass),
    username character varying(32) COLLATE pg_catalog."default" NOT NULL,
    description character varying(255) COLLATE pg_catalog."default" NOT NULL,
    description_tokens tsvector,
    streams_count integer NOT NULL,
    CONSTRAINT django_user_pkey PRIMARY KEY (id),
    CONSTRAINT django_user_streams_count_check CHECK (streams_count >= 0)
)
WITH (
    OIDS = FALSE
)
TABLESPACE pg_default;

CREATE INDEX django_user_description_tokens_07422d46
    ON public.django_user USING btree
    (description_tokens)
    TABLESPACE pg_default;

CREATE INDEX django_user_streams_count_66aa1edc
    ON public.django_user USING btree
    (streams_count)
    TABLESPACE pg_default;

慢速查询：

SELECT 
    streams_count, username, description
FROM
    "django_user" 
WHERE 
    to_tsvector('english'::regconfig, COALESCE(("django_user"."description_tokens")::text, '')) @@ (plainto_tsquery('english'::regconfig, 'react redux')) = true 
ORDER BY streams_count ASC LIMIT 20

如果我删除ORDER BY streams_count ASC，一切正常。这是查询的解释：

"Limit  (cost=174377.42..174379.75 rows=20 width=106) (actual time=7363.660..7368.257 rows=20 loops=1)"
"  ->  Gather Merge  (cost=174377.42..174379.99 rows=22 width=106) (actual time=7363.658..7368.245 rows=20 loops=1)"
"        Workers Planned: 2"
"        Workers Launched: 2"
"        ->  Sort  (cost=173377.40..173377.42 rows=11 width=106) (actual time=7359.708..7359.710 rows=15 loops=3)"
"              Sort Key: streams_count"
"              Sort Method: top-N heapsort  Memory: 31kB"
"              Worker 0:  Sort Method: top-N heapsort  Memory: 32kB"
"              Worker 1:  Sort Method: top-N heapsort  Memory: 32kB"
"              ->  Parallel Seq Scan on django_user  (cost=0.00..173377.21 rows=11 width=106) (actual time=24.870..7359.379 rows=109 loops=3)"
"                    Filter: (to_tsvector('english'::regconfig, COALESCE((description_tokens)::text, ''::text)) @@ '''react'' & ''redux'''::tsquery)"
"                    Rows Removed by Filter: 347231"
"Planning Time: 0.298 ms"
"Execution Time: 7368.293 ms"

知道我缺少什么吗？

Answer 1

tsvector上的btree索引非常没用。使其成为GIN索引，以便查询可以通过@@运算符使用它。

关于为何ORDER BY制作速度较慢的原因，请考虑一下，要找到芝加哥的20个身穿红色领带的最高人，而不是芝加哥的20个身穿红色领带的随机人，要困难得多。

添加订单时Postgres的性能降低

1 个答案: