Question

我的表格大约有300 000行INT[]列类型

每个数组包含大约2000个元素

我为这个数组列创建了索引

create index index_name ON table_name USING GIN (column_name)

然后运行查询：

SELECT COUNT(*)
FROM table_name 
WHERE
column_name@> ARRAY[1777]

此查询运行速度非常慢Execution time: 66886.132 ms，而EXPLAIN ANALYZE显示，不使用GIN索引，仅使用Seq Scan索引。

为什么不使用Postgres GIN索引和主目的地：如何以尽可能快的速度运行上面的查询？

修改

这是上述查询explain (analyze, verbose)的结果

Aggregate  (cost=10000024724.75..10000024724.76 rows=1 width=0) (actual time=61087.513..61087.513 rows=1 loops=1)
  Output: count(*)
  ->  Seq Scan on public.users  (cost=10000000000.00..10000024724.00 rows=300 width=0) (actual time=12104.651..61087.500 rows=5 loops=1)
        Output: id, email, pass, nick, reg_dt, reg_ip, gender, curr_location, about, followed_tag_ids, avatar_img_ext, rep_tag_ids, rep_tag_id_scores, stats, status
        Filter: (users.rep_tag_ids @> '{1777}'::integer[])
        Rows Removed by Filter: 299995
Planning time: 0.110 ms
Execution time: 61087.564 ms

这是表和索引定义

CREATE TABLE users
(
  id serial PRIMARY KEY,
  rep_tag_ids integer[] DEFAULT '{}'
  -- other columns here
);

create index users_rep_tag_ids_idx ON users USING GIN (rep_tag_ids);

Answer 1

您应该帮助查询优化器使用索引。如果您还没有安装PostgreSQL的intarray扩展名，请使用gin__int_ops运算符类重新创建索引。

DROP INDEX users_rep_tag_ids_idx;
CREATE INDEX users_rep_tag_ids_idx ON users USING gin (rep_tag_ids gin__int_ops);

在INTARRAY列中搜索速度更快

1 个答案: