Question

我正在尝试优化以下联接查询：

通知是一条记录，表明用户是否已阅读某些活动。一个通知指向一项活动，但是可以向许多用户通知一项活动。活动记录中包含一些列，例如活动所在的工作空间和活动类型。

此查询获取按时间顺序在特定工作区中已读取的用户非注释通知。

explain analyze
select activity.id from activity, notification
where notification.user_id = '9a51f675-e1e2-46e5-8bcd-6bc535c7e7cb'
and notification.read = true

and notification.activity_id = activity.id

and activity.space_id = '6d702c09-8795-4185-abb3-dc6b3e8907dc'
and activity.type != 'commented'
order by activity.end_time desc
limit 20;

问题在于此查询必须遍历用户获得的每条通知。

Limit  (cost=4912.35..4912.36 rows=1 width=24) (actual time=138.767..138.779 rows=20 loops=1)
  ->  Sort  (cost=4912.35..4912.36 rows=1 width=24) (actual time=138.766..138.770 rows=20 loops=1)
        Sort Key: activity.end_time DESC
        Sort Method: top-N heapsort  Memory: 27kB
        ->  Nested Loop  (cost=32.57..4912.34 rows=1 width=24) (actual time=1.354..138.606 rows=447 loops=1)
              ->  Bitmap Heap Scan on notification  (cost=32.01..3847.48 rows=124 width=16) (actual time=1.341..6.639 rows=1218 loops=1)
                    Recheck Cond: (user_id = '9a51f675-e1e2-46e5-8bcd-6bc535c7e7cb'::uuid)
                    Filter: read
                    Rows Removed by Filter: 4101
                    Heap Blocks: exact=4774
                    ->  Bitmap Index Scan on notification_user_id_idx  (cost=0.00..31.98 rows=988 width=0) (actual time=0.719..0.719 rows=5355 loops=1)
                          Index Cond: (user_id = '9a51f675-e1e2-46e5-8bcd-6bc535c7e7cb'::uuid)
              ->  Index Scan using activity_pkey on activity  (cost=0.56..8.59 rows=1 width=24) (actual time=0.108..0.108 rows=0 loops=1218)
                    Index Cond: (id = notification.activity_id)
                    Filter: ((type <> 'commented'::activity_type) AND (space_id = '6d702c09-8795-4185-abb3-dc6b3e8907dc'::uuid))
                    Rows Removed by Filter: 1
Planning time: 0.428 ms
Execution time: 138.825 ms

编辑：这是预热缓存后的性能。

Limit  (cost=4912.35..4912.36 rows=1 width=24) (actual time=13.618..13.629 rows=20 loops=1)
  ->  Sort  (cost=4912.35..4912.36 rows=1 width=24) (actual time=13.617..13.621 rows=20 loops=1)
        Sort Key: activity.end_time DESC
        Sort Method: top-N heapsort  Memory: 27kB
        ->  Nested Loop  (cost=32.57..4912.34 rows=1 width=24) (actual time=1.365..13.447 rows=447 loops=1)
              ->  Bitmap Heap Scan on notification  (cost=32.01..3847.48 rows=124 width=16) (actual time=1.352..6.606 rows=1218 loops=1)
                    Recheck Cond: (user_id = '9a51f675-e1e2-46e5-8bcd-6bc535c7e7cb'::uuid)
                    Filter: read
                    Rows Removed by Filter: 4101
                    Heap Blocks: exact=4774
                    ->  Bitmap Index Scan on notification_user_id_idx  (cost=0.00..31.98 rows=988 width=0) (actual time=0.729..0.729 rows=5355 loops=1)
                          Index Cond: (user_id = '9a51f675-e1e2-46e5-8bcd-6bc535c7e7cb'::uuid)
              ->  Index Scan using activity_pkey on activity  (cost=0.56..8.59 rows=1 width=24) (actual time=0.005..0.005 rows=0 loops=1218)
                    Index Cond: (id = notification.activity_id)
                    Filter: ((type <> 'commented'::activity_type) AND (space_id = '6d702c09-8795-4185-abb3-dc6b3e8907dc'::uuid))
                    Rows Removed by Filter: 1
Planning time: 0.438 ms
Execution time: 13.673 ms

我可以在user_id上创建一个多列索引并读取，但这并不能解决我要解决的问题。

我可以通过手动对数据进行规范化，在通知记录中添加space_id，type和end_time列来解决此问题，但这似乎不必要。

我希望Postgres能够在两个表之间创建索引，但是到目前为止，我读到的所有内容都表明这是不可能的。

所以我的问题是：优化此查询的最佳方法是什么？

编辑：创建建议的索引后：

create index tmp_index_1 on activity using btree (
    space_id, 
    id, 
    end_time
) where (
    type != 'commented'
);

create index tmp_index_2 on notification using btree (
    user_id,
    activity_id
) where (
    read = true
);

查询性能提高了3倍。

explain analyse
select activity.id from activity
INNER JOIN notification  ON  notification.user_id = '9a51f675-e1e2-46e5-8bcd-6bc535c7e7cb'
    and notification.read = true
    and notification.activity_id = activity.id
    and activity.space_id = '6d702c09-8795-4185-abb3-dc6b3e8907dc'
    and activity.type != 'commented'
order by activity.end_time desc
limit 20;

Limit  (cost=955.26..955.27 rows=1 width=24) (actual time=4.386..4.397 rows=20 loops=1)
  ->  Sort  (cost=955.26..955.27 rows=1 width=24) (actual time=4.385..4.389 rows=20 loops=1)
        Sort Key: activity.end_time DESC
        Sort Method: top-N heapsort  Memory: 27kB
        ->  Nested Loop  (cost=1.12..955.25 rows=1 width=24) (actual time=0.035..4.244 rows=447 loops=1)
              ->  Index Only Scan using tmp_index_2 on notification  (cost=0.56..326.71 rows=124 width=16) (actual time=0.017..1.039 rows=1218 loops=1)
                    Index Cond: (user_id = '9a51f675-e1e2-46e5-8bcd-6bc535c7e7cb'::uuid)
                    Heap Fetches: 689
              ->  Index Only Scan using tmp_index_1 on activity  (cost=0.56..5.07 rows=1 width=24) (actual time=0.002..0.002 rows=0 loops=1218)
                    Index Cond: ((space_id = '6d702c09-8795-4185-abb3-dc6b3e8907dc'::uuid) AND (id = notification.activity_id))
                    Heap Fetches: 1
Planning time: 0.484 ms
Execution time: 4.428 ms

让我对此查询感到困扰的一件事是rows=1218和loops=1218。该查询遍历所有已读取的用户通知，并针对活动表进行查询。

我希望能够创建一个索引来以模拟非规范化此数据的方式读取所有这些索引。例如，如果我在通知表中添加space_id，type和end_time，则可以创建以下索引并以毫秒为单位进行读取。

create index tmp_index_3 on notification using btree (
    user_id,
    space_id,
    end_time desc
) where (
    read = true 
    and type != 'commented'
);

在当前情况下，如果不进行非规范化，这在Postgres中是不可能的吗？

Answer 1

查看您的代码，您应使用

过滤复合索引

table notification  columns  : user_id, read, activity_id


table activity columns space_id, type, id

对于查询和订购，您还可以在组合中添加end_time以进行活动

   table activity columns space_id, type, id, end_time

，您还应该使用显式内部联接sintax

select activity.id from activity
INNER JOIN notification  ON  notification.user_id = '9a51f675-e1e2-46e5-8bcd-6bc535c7e7cb'
    and notification.read = true
    and notification.activity_id = activity.id
    and activity.space_id = '6d702c09-8795-4185-abb3-dc6b3e8907dc'
    and activity.type != 'commented'
order by activity.end_time desc
limit 20;

Answer 2

添加索引：

create index ix1_activity on activity (space_id, type, end_time, id);

create index ix2_notification on notification (activity_id, user_id, read);

这两个“覆盖索引”可以使您的查询真正快速。

此外，如果运气好的话，它将首先读取activity表（仅20行），并在notification上执行嵌套循环联接（NLJ）。也就是说，索引遍历非常有限。

PostgreSQL中多个表的索引列

2 个答案: