Question

我有两个表：

具有约100万行的表_1，列id_t1：整数，列c1_t1：varchar等。
具有约5000万行的表_2，其中列id_t2：整数，ref_id_t1：整数，c1_t2：varchar等。

ref_id_t1填充有id_t1值，但是它们不通过外键链接，因为table_2不了解table_1。

我需要在两个表上都做一个请求，如下所示：

SELECT * FROM table_1 t1 WHERE t1.c1_t1= 'A' AND t1.id_t1 IN
(SELECT t2.ref_id_t1 FROM table_2 t2 WHERE t2.c1_t2 LIKE '%abc%');

由于对表_2进行了顺序扫描，因此没有任何更改或使用基本索引，该请求大约需要一分钟才能完成。为了防止这种情况，我使用gin_trgm_ops选项创建了一个GIN idex：

CREATE EXTENSION pg_trgm;
CREATE INDEX c1_t2_gin_index ON table_2 USING gin (c1_t2, gin_trgm_ops);

但这不能解决问题，因为内部请求仍然需要很长时间。

EXPLAIN ANALYSE SELECT t2.ref_id_t1 FROM table_2 t2 WHERE t2.c1_t2 LIKE '%abc%'

给出以下内容

Bitmap Heap Scan on table_2 t2 (cost=664.20..189671.00 rows=65058 width=4) (actual time=5101.286..22854.838 rows=69631 loops=1)
  Recheck Cond: ((c1_t2 )::text ~~ '%1.1%'::text)
  Rows Removed by Index Recheck: 49069703
  Heap Blocks: exact=611548
  ->  Bitmap Index Scan on gin_trg  (cost=0.00..647.94 rows=65058 width=0) (actual time=4911.125..4911.125 rows=49139334 loops=1)
        Index Cond: ((c1_t2)::text ~~ '%1.1%'::text)
Planning time: 0.529 ms
Execution time: 22863.017 ms

位图索引扫描是快速的，但是由于我们需要t2.ref_id_t1 PostgreSQL需要执行位图堆扫描，而这在65000行数据上并不快速。

避免位图堆扫描的解决方案是执行“仅索引扫描”。这可以通过使用具有btree索引的多列来实现，请参见https://www.postgresql.org/docs/9.6/static/indexes-index-only-scans.html

如果我更改了请求以搜索c1_t2的开头，即使内部请求返回了90000行，并且如果我在c1_t2和ref_id_t1上创建了btree索引，则该请求也将花费一秒钟以上。

CREATE INDEX c1_t2_ref_id_t1_index
    ON table_2  USING btree
    (c1_t2 varchar_pattern_ops ASC NULLS LAST, ref_id_t1 ASC NULLS LAST)


EXPLAIN ANALYSE SELECT * FROM table_1 t1 WHERE t1.c1_t1= 'A' AND t1.id_t1 IN
    (SELECT t2.ref_id_t1 FROM table_2 t2 WHERE t2.c1_t2 LIKE 'aaa%');

Hash Join  (cost=56561.99..105233.96 rows=1 width=2522) (actual time=953.647..1068.488 rows=36 loops=1)
  Hash Cond: (t1.id_t1 = t2.ref_id_t1)
  ->  Seq Scan on table_1 t1  (cost=0.00..48669.65 rows=615 width=2522) (actual time=0.088..667.576 rows=790 loops=1)
        Filter: (c1_t1 = 'A')
        Rows Removed by Filter: 1083798
  ->  Hash  (cost=56553.74..56553.74 rows=660 width=4) (actual time=400.657..400.657 rows=69632 loops=1)
        Buckets: 131072 (originally 1024)  Batches: 1 (originally 1)  Memory Usage: 3472kB
        ->  HashAggregate  (cost=56547.14..56553.74 rows=660 width=4) (actual time=380.280..391.871 rows=69632 loops=1)
              Group Key: t2.ref_id_t1
              ->  Index Only Scan using c1_t2_ref_id_t1_index on table_2 t2   (cost=0.56..53907.28 rows=1055943 width=4) (actual time=0.014..202.034 rows=974737 loops=1)
                    Index Cond: ((c1_t2  ~>=~ 'aaa'::text) AND (c1_t2  ~<~ 'chb'::text))
                    Filter: ((c1_t2 )::text ~~ 'aaa%'::text)
                    Heap Fetches: 0
Planning time: 1.512 ms
Execution time: 1069.712 ms

但是对于gin索引，这是不可能的，因为这些索引不会在密钥中存储所有数据。

是否可以使用类似pg_trmg的扩展名与btree索引一起使用，以便我们只能使用 LIKE'％abc％'请求进行索引扫描？

有没有办法在PostgreSQL上使用pg_trgm像带btree索引的运算符？

0 个答案: