无法使此PostgreSQL查询更好地工作

时间:2018-01-18 17:56:15

标签: sql postgresql

我使用PostGIS 2.3.3运行PostgreSQL 9.6

我试图让这个相当关键的查询变得更快(寻找1000米范围内的用户),但我在设置正确的索引方面遇到了麻烦。

有人能指出我正确的方向吗?

users为200k行,locations为1200行,user_push_tokens为155k行

用户表:

create table users
(
  id serial not null
    constraint users_pkey
      primary key,
  (an additional 20-ish columns),
  geo_point geometry(Point,4326)
);`

create unique index users_id
  on users (id);

create index users_geo_point_idx
  on users using gist(geo_point);

user_push_tokens表:

create table user_push_tokens
(
  user_id integer not null
    constraint push_tokens_user_id_fkey
      references users
        on delete cascade,
  push_token varchar(255) not null,
  push_provider varchar(64) not null,
  app varchar(64) default 'app'::character varying not null,
  id integer default nextval('user_push_tokens_id_seq'::regclass) not null
    constraint user_push_tokens_pkey
      primary key,
  active boolean default true not null,
);

create index trinity_unique_index
  on user_push_tokens (user_id, app, push_token);

create index user_push_tokens_token_fetch_idx
  on user_push_tokens (user_id, app, active);

create index user_id_index
  on user_push_tokens (user_id);

位置表:

create table locations
(
  id integer default nextval('locations_id_seq'::regclass) not null
    constraint locations_pkey
      primary key,
  (another 25 columns),
  geo_point geometry(Point,4326)
);

create unique index locations_id_key
  on locations (id);

create index locations_geo_point_idx
  on locations using gist(geo_point);

查询

EXPLAIN ANALYSE SELECT
    u.id AS user_id,
    upt.push_provider,
    upt.push_token
  FROM users u
    JOIN locations l ON l.id = 3896
    JOIN user_push_tokens upt
      ON upt.user_id = u.id AND upt.active = true AND upt.app = 'app'
  WHERE ST_DistanceSphere(u.geo_point, l.geo_point) <= 1000;

结果

Nested Loop  (cost=26087.06..63658.87 rows=30605 width=107) (actual time=353.304..887.371 rows=2498 loops=1)
  Join Filter: (_st_distance(geography(u.geo_point), geography(l.geo_point), '0'::double precision, false) <= '1000'::double precision)
  Rows Removed by Join Filter: 89539
  ->  Index Scan using locations_id_key on locations l  (cost=0.28..8.29 rows=1 width=32) (actual time=0.009..0.014 rows=1 loops=1)
        Index Cond: (id = 3896)
  ->  Hash Join  (cost=26086.78..39090.07 rows=91815 width=139) (actual time=352.437..657.228 rows=92037 loops=1)
        Hash Cond: (upt.user_id = u.id)
        ->  Seq Scan on user_push_tokens upt  (cost=0.00..7162.82 rows=91815 width=107) (actual time=0.032..103.512 rows=92037 loops=1)
              Filter: (active AND ((app)::text = 'app'::text))
              Rows Removed by Filter: 62437
        ->  Hash  (cost=22114.46..22114.46 rows=195546 width=36) (actual time=352.199..352.199 rows=195589 loops=1)
              Buckets: 65536  Batches: 4  Memory Usage: 3563kB
              ->  Seq Scan on users u  (cost=0.00..22114.46 rows=195546 width=36) (actual time=0.014..214.976 rows=195589 loops=1)

More readable explain output

感谢您的时间。

1 个答案:

答案 0 :(得分:1)

像这样的索引会使它稍快一些:

CREATE INDEX ON user_push_tokens (app) WHERE active;

但是没有办法通过函数调用来加速嵌套循环。

要研究的一个想法是添加第二个“宽松”距离检查,该检查是可索引的,可能有误报,但过滤掉大部分行。

不确定这是否可行,但值得考虑。