我目前正在优化我在PostgreSQL的jsonb字段上的搜索结果。我正在使用Postgres 9.6。我的最终目标是在我的jsonb文档中搜索多个字段,并根据所有字段中的总点击次数对结果进行排名。但是我被卡住了因为ts_rank函数没有使用我的索引并且极大地减慢了搜索速度。这是一个最小的例子:
CREATE TABLE book (
id BIGSERIAL NOT NULL,
data JSONB NOT NULL
);
CREATE INDEX book_title_idx
ON book USING GIN (to_tsvector('english', book.data ->> 'title'));
INSERT INTO book (data)
VALUES (CAST('{"title": "Cats"}' AS JSONB));
尝试搜索标题字段时,我正在使用此查询:
EXPLAIN ANALYZE
SELECT *
FROM (
SELECT
id,
data ->> 'title' AS title,
ts_rank(title_query, 'cat:*') AS score
FROM
book,
to_tsvector('english', data ->> 'title') title_query
WHERE title_query @@ to_tsquery('cat:*')
ORDER BY score DESC) a
WHERE score > 0
ORDER BY score DESC;
如果不对我的真实数据进行排名,则需要< 1ms,排名为~1800ms。我搜索的字段越多越糟糕。我需要排名才能让多个领域的点击更有价值。
答案 0 :(得分:2)
您的查询提供了计划(在包含500000行的测试数据集上):
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=216058.57..217308.57 rows=500001 width=63) (actual time=831.033..831.033 rows=1 loops=1)
Sort Key: (ts_rank(title_query.title_query, '''cat'':*'::tsquery)) DESC
Sort Method: quicksort Memory: 25kB
-> Nested Loop (cost=0.25..149927.55 rows=500001 width=63) (actual time=4.410..830.950 rows=1 loops=1)
-> Seq Scan on book (cost=0.00..8677.01 rows=500001 width=31) (actual time=0.024..30.159 rows=500001 loops=1)
-> Function Scan on to_tsvector title_query (cost=0.25..0.52 rows=1 width=32) (actual time=0.001..0.001 rows=0 loops=500001)
Filter: ((ts_rank(title_query, '''cat'':*'::tsquery) > '0'::double precision) AND (title_query @@ to_tsquery('cat:*'::text)))
Rows Removed by Filter: 1
Planning time: 37.211 ms
Execution time: 831.279 ms
(10 rows)
将title_query
子句中的别名WHERE
替换为索引定义中使用的表达式:
EXPLAIN ANALYZE
SELECT *
FROM (
SELECT
id,
data ->> 'title' AS title,
ts_rank(title_query, 'cat:*') AS score
FROM
book,
to_tsvector('english', data ->> 'title') title_query
WHERE to_tsvector('english', data ->> 'title') @@ to_tsquery('cat:*')
ORDER BY score DESC) a
WHERE score > 0
ORDER BY score DESC;
Sort (cost=9905.39..9930.39 rows=10000 width=63) (actual time=1.069..1.069 rows=1 loops=1)
Sort Key: (ts_rank(title_query.title_query, '''cat'':*'::tsquery)) DESC
Sort Method: quicksort Memory: 25kB
-> Nested Loop (cost=114.00..9241.00 rows=10000 width=63) (actual time=1.049..1.050 rows=1 loops=1)
-> Bitmap Heap Scan on book (cost=113.75..8940.75 rows=10000 width=31) (actual time=0.052..0.052 rows=1 loops=1)
Recheck Cond: (to_tsvector('english'::regconfig, (data ->> 'title'::text)) @@ to_tsquery('cat:*'::text))
Heap Blocks: exact=1
-> Bitmap Index Scan on book_title_idx (cost=0.00..111.25 rows=10000 width=0) (actual time=0.047..0.047 rows=1 loops=1)
Index Cond: (to_tsvector('english'::regconfig, (data ->> 'title'::text)) @@ to_tsquery('cat:*'::text))
-> Function Scan on to_tsvector title_query (cost=0.25..0.27 rows=1 width=32) (actual time=0.994..0.994 rows=1 loops=1)
Filter: (ts_rank(title_query, '''cat'':*'::tsquery) > '0'::double precision)
Planning time: 0.639 ms
Execution time: 1.120 ms
(13 rows)