在使用LIMIT..OFFSET

时间:2019-06-06 11:06:04

标签: postgresql indices postgresql-9.6

x86_64-pc-linux-gnu上的PostgreSQL 9.6.3,由gcc(Debian 4.9.2-10)4.9.2,64位编译

表和索引:

create table if not exists orders
(
    id bigserial not null constraint orders_pkey primary key,
    partner_id integer,
    order_id varchar,
    date_created date,
    state_code integer,
    state_date timestamp,
    recipient varchar,
    phone varchar,
);

create index if not exists orders_partner_id_index on orders (partner_id);
create index if not exists orders_order_id_index on orders (order_id);
create index if not exists orders_partner_id_date_created_index on orders (partner_id, date_created);

任务是创建分页/排序/过滤数据。

对第一页的查询:

select order_id, date_created, recipient, phone, state_code, state_date
from orders
where partner_id=1 and date_created between '2019-04-01' and '2019-04-30'
order by order_id asc limit 10 offset 0;

查询计划:

QUERY PLAN
"Limit  (cost=19495.48..38990.41 rows=10 width=91)"
"  ->  Index Scan using orders_order_id_index on orders  (cost=0.56..**41186925.66** rows=21127 width=91)"
"        Filter: ((date_created >= '2019-04-01'::date) AND (date_created <= '2019-04-30'::date) AND (partner_id = 1))"

未使用索引orders_partner_id_date_created_index,因此成本非常高!

但是从某些偏移值开始(确切的值有时会有所不同,看起来取决于行总数),索引开始使用:

select order_id, date_created, recipient, phone, state_code, state_date
from orders
where partner_id=1 and date_created between '2019-04-01' and '2019-04-30'
order by order_id asc limit 10 offset 40;

计划:

QUERY PLAN
"Limit  (cost=81449.76..81449.79 rows=10 width=91)"
"  ->  Sort  (cost=81449.66..81502.48 rows=21127 width=91)"
"        Sort Key: order_id"
"        ->  Bitmap Heap Scan on orders  (cost=4241.93..80747.84 rows=21127 width=91)"
"              Recheck Cond: ((partner_id = 1) AND (date_created >= '2019-04-01'::date) AND (date_created <= '2019-04-30'::date))"
"              ->  Bitmap Index Scan on orders_partner_id_date_created_index  (cost=0.00..4236.65 rows=21127 width=0)"
"                    Index Cond: ((partner_id = 1) AND (date_created >= '2019-04-01'::date) AND (date_created <= '2019-04-30'::date))"

发生了什么事?这是强制服务器使用索引的方法吗?

1 个答案:

答案 0 :(得分:3)

一般答案:

  • Postgres存储有关表的一些信息
  • 执行查询之前,计划者根据这些信息准备执行计划
  • 对于您而言,计划者认为对于某些偏移值,此次优计划会更好。请注意,您所需的计划要求按order_id对所有选定行进行排序,而此“更差”的计划则不需要。我猜想Postgres押注会有很多这样的行用于各种订单,只是从最低开始依次测试一个订单。

我可以想到两种解决方案:

A)通过运行向刨床提供更多数据

ANALYZE orders;

https://www.postgresql.org/docs/9.6/sql-analyze.html

或者bo更改收集的统计信息

ALTER TABLE orders SET STATISTCS (...);

https://www.postgresql.org/docs/9.6/planner-stats.html

B)以暗示所需索引使用方式的方式重写查询,如下所示:

WITH
partner_date (partner_id, date_created) AS (
    SELECT  1,
            generate_series('2019-04-01'::date, '2019-04-30'::date, '1 day'::interval)::date
)
SELECT o.order_id, o.date_created, o.recipient, o.phone, o.state_code, o.state_date
FROM   orders o
JOIN   partner_date pd
    ON (o.partner_id, o.date_created) = (pd.partner_id, pd.date_created)
ORDER BY order_id ASC LIMIT 10 OFFSET 0;

也许更好:

WITH
partner_date (partner_id, date_created) AS (
    SELECT  1,
            generate_series('2019-04-01'::date, '2019-04-30'::date, '1 day'::interval)::date
), 
all_data AS (
    SELECT o.order_id, o.date_created, o.recipient, o.phone, o.state_code, o.state_date
    FROM   orders o
    JOIN   partner_date pd
        ON (o.partner_id, o.date_created) = (pd.partner_id, pd.date_created)
)
SELECT *
FROM   all_data
ORDER BY order_id ASC LIMIT 10 OFFSET 0;

免责声明-我无法解释为什么Postgres规划器应该以其他方式解释第一个查询,只是认为可以。另一方面,第二个查询从联接中分离了偏移量/限制,如果Postgres仍然以“不良”(根据您的基准)方式进行操作,我会感到非常惊讶。