我试图让PostgreSQL使用索引进行前缀搜索,使用全文搜索。它通常工作正常,但只有在我导入数据后才创建索引。也许这是某种预期的行为,但我不明白。
首先,我创建索引,然后使用COPY命令导入数据:
CREATE INDEX account_fts_idx ON account
USING gin(to_tsvector('german', remote_id || ' ' || name || ' ' || street || ' ' || zip || ' ' || city ));
COPY account (id, remote_id, name, street, zip, city ...) FROM '/path/account.csv' WITH DELIMITER ',' CSV;
然后我使用以下select语句运行PREFIX(可能那很重要)搜索:
EXPLAIN ANALYZE SELECT a.id, a.remote_id, a.name, a.street, a.zip, a.city, al.latitude, al.longitude
FROM account a
LEFT JOIN account_location al ON al.id = a.id
WHERE (to_tsvector('german', a.remote_id || ' ' || a.name || ' ' || a.street || ' ' || a.zip || ' ' || a.city)
@@ (to_tsquery('german', 'hambu:*')))
导致性能不佳,因为未使用索引:
Hash Left Join (cost=28.00..3389.97 rows=319 width=94) (actual time=1.685..1237.674 rows=1336 loops=1)
Hash Cond: (a.id = al.id)
-> Seq Scan on account a (cost=0.00..3360.73 rows=319 width=78) (actual time=1.665..1236.589 rows=1336 loops=1)
Filter: (to_tsvector('german'::regconfig, (((((((((remote_id)::text || ' '::text) || (name)::text) || ' '::text) || (street)::text) || ' '::text) || (zip)::text) || ' '::text) || (city)::text)) @@ '''hambu'':*'::tsquery)
-> Hash (cost=18.00..18.00 rows=800 width=24) (actual time=0.001..0.001 rows=0 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 0kB
-> Seq Scan on account_location al (cost=0.00..18.00 rows=800 width=24) (actual time=0.001..0.001 rows=0 loops=1)
Total runtime: 1237.928 ms
现在出现了一个奇怪的部分:如果我删除索引并使用相同的CREATE INDEX命令重新创建它,则相同的SELECT查询使用索引并且非常快。
Hash Left Join (cost=61.92..1290.73 rows=1278 width=94) (actual time=0.561..1.918 rows=1336 loops=1)
Hash Cond: (a.id = al.id)
-> Bitmap Heap Scan on account a (cost=33.92..1257.78 rows=1278 width=78) (actual time=0.551..1.442 rows=1336 loops=1)
Recheck Cond: (to_tsvector('german'::regconfig, (((((((((remote_id)::text || ' '::text) || (name)::text) || ' '::text) || (street)::text) || ' '::text) || (zip)::text) || ' '::text) || (city)::text)) @@ '''hambu'':*'::tsquery)
-> Bitmap Index Scan on account_fts_idx (cost=0.00..33.60 rows=1278 width=0) (actual time=0.490..0.490 rows=1336 loops=1)
Index Cond: (to_tsvector('german'::regconfig, (((((((((remote_id)::text || ' '::text) || (name)::text) || ' '::text) || (street)::text) || ' '::text) || (zip)::text) || ' '::text) || (city)::text)) @@ '''hambu'':*'::tsquery)
-> Hash (cost=18.00..18.00 rows=800 width=24) (actual time=0.001..0.001 rows=0 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 0kB
-> Seq Scan on account_location al (cost=0.00..18.00 rows=800 width=24) (actual time=0.001..0.001 rows=0 loops=1)
Total runtime: 2.054 ms
那么为什么必须在导入后创建索引?
对我来说更重要的是:新行(通常是通过INSERT INTO添加)是否会添加到索引中?
答案 0 :(得分:2)
对于具有GIN索引的表,VACUUM(以任何形式)也通过将挂起的索引条目移动到主GIN索引结构中的适当位置来完成任何挂起的索引插入。 (PostgreSQL Documentation: VACUUM)
运行VACUUM account
后,SELECT查询使用索引作为预期。