Question

有15M行的表格可以保存用户的收件箱数据

 user_id         | integer                  | not null
 subject         | character varying(255)   | not null 
...
 last_message_id | integer                  | 
 last_message_at | timestamp with time zone |
 deleted_at      | timestamp with time zone |

简而言之，这是一个慢查询：

SELECT * 
FROM dialogs 
WHERE user_id = 1234 
AND deleted_at IS NULL 
LIMIT 21

完整查询 （删除了不相关的字段）

SELECT "dialogs"."id", "dialogs"."subject", "dialogs"."product_id", "dialogs"."user_id", "dialogs"."participant_id", "dialogs"."thread_id", "dialogs"."last_message_id", "dialogs"."last_message_at", "dialogs"."read_at", "dialogs"."deleted_at", "products"."id", ... , T4."id", ... , "messages"."id", ...,  
FROM "dialogs" 
LEFT OUTER JOIN "products" ON ("dialogs"."product_id" = "products"."id") 
INNER JOIN "auth_user" T4 ON ("dialogs"."participant_id" = T4."id")
LEFT OUTER JOIN "messages" ON ("dialogs"."last_message_id" = "messages"."id") 
WHERE ("dialogs"."deleted_at" IS NULL AND "dialogs"."user_id" = 9069) 
ORDER BY "dialogs"."last_message_id" DESC
LIMIT 21;

说明：

Limit  (cost=1.85..28061.24 rows=21 width=1693) (actual time=4.700..93087.871 rows=17 loops=1)
  ->  Nested Loop Left Join  (cost=1.85..9707215.30 rows=7265 width=1693) (actual time=4.699..93087.861 rows=17 loops=1)
        ->  Nested Loop  (cost=1.41..9647421.07 rows=7265 width=1457) (actual time=4.689..93062.481 rows=17 loops=1)
              ->  Nested Loop Left Join  (cost=0.99..9611285.66 rows=7265 width=1115) (actual time=4.676..93062.292 rows=17 loops=1)
                    ->  Index Scan Backward using dialogs_last_message_id on dialogs  (cost=0.56..9554417.92 rows=7265 width=102) (actual time=4.629..93062.050 rows=17 loops=1)
                          Filter: ((deleted_at IS NULL) AND (user_id = 9069))
                          Rows Removed by Filter: 6852907
                    ->  Index Scan using products_pkey on products  (cost=0.43..7.82 rows=1 width=1013) (actual time=0.012..0.012 rows=1 loops=17)
                          Index Cond: (dialogs.product_id = id)
              ->  Index Scan using auth_user_pkey on auth_user t4  (cost=0.42..4.96 rows=1 width=342) (actual time=0.009..0.010 rows=1 loops=17)
                    Index Cond: (id = dialogs.participant_id)
        ->  Index Scan using messages_pkey on messages  (cost=0.44..8.22 rows=1 width=236) (actual time=1.491..1.492 rows=1 loops=17)
              Index Cond: (dialogs.last_message_id = id)
Total runtime: 93091.494 ms
(14 rows)

OFFSET未使用
user_id字段有索引。
由于高选择性（90％值实际上为NULL），因此未使用deleted_at上的索引。部分索引（... WHERE deleted_at IS NULL）也无济于事。
如果查询遇到很久以前创建的某些结果，它会变得特别慢。然后查询必须过滤并丢弃其间数百万行。

索引列表：

Indexes:
    "dialogs_pkey" PRIMARY KEY, btree (id)
    "dialogs_deleted_at_d57b320e_uniq" btree (deleted_at) WHERE deleted_at IS NULL
    "dialogs_last_message_id" btree (last_message_id)
    "dialogs_participant_id" btree (participant_id)
    "dialogs_product_id" btree (product_id)
    "dialogs_thread_id" btree (thread_id)
    "dialogs_user_id" btree (user_id)

目前我正在考虑仅查询最近的数据（即具有适当索引的... WHERE last_message_at > <date 3-6 month ago>（BRIN？）。

加快此类查询的最佳做法是什么？

Answer 1

发表在评论中：

首先在(user_id, last_message_id)上创建条件为WHERE deleted_at IS NULL

的部分索引

根据你的回答，这似乎非常有效： - ）

Answer 2

所以，这是我尝试的解决方案的结果

1）索引@interface ViewController (){ int i; NSTimer *myTimer; } @end @implementation ViewController - (void)viewDidLoad { [super viewDidLoad]; myTimer = [NSTimer scheduledTimerWithTimeInterval:0.5 target:self selector:@selector(setData) userInfo:nil repeats:YES]; i = 0; } -(void)setData{ _totalEffort.text=[NSString stringWithFormat:@"%d",i]; i = i + 1; if(i == 100){ [myTimer invalidate]; myTimer = nil; } }在极少数情况下使用，具体取决于(user_id) WHERE deleted_at IS NULL条件中的某些值user_id。大多数情况下，查询必须像以前一样过滤掉行。

2）最大的加速是使用 WHERE user_id = ?索引。虽然它比其他测试索引大2.5倍，但它一直在使用并且速度非常快。这是由此产生的查询计划

(user_id, last_message_id) WHERE deleted_at IS NULL

谢谢@jcaron。你的建议应该是一个公认的答案。

PostgreSQL有效查询，过滤器超过布尔值

2 个答案: