Question

我们在Oracle中编写了一个适用于在PostgreSQL中运行的查询。逻辑是相同的，表的内容也是如此，但Oracle查询的运行速度要快得多。表结构在两个数据库中匹配。

在评估解释计划时，Oracle有一个主要区别，即利用索引（实际上是主键），而PostgreSQL查询却没有。

我试图按如下方式简化结构以证明问题。

create table stage.work_order_operations (
  work_order text not null,
  sequence_number integer not null,
  status text,
  remaining_hours numeric,
  complete_date date,
  constraint work_order_operations_pk primary key (work_order, sequence_number)
);

create index work_order_operations_ix1 on stage.work_order_master (complete_date);

create table stage.work_order_master (
  work_order text not null,
  status_code text not null,
  part_number text not null,
  quantity integer,
  constraint work_order_master_pk primary key (work_order)
);

create index work_order_master_ix1 on stage.work_order_master (status_code);

查询如下：

select
  op.*
from
  stage.work_order_master wo
  join stage.work_order_operations op on
    wo.work_order = op.work_order
where
  wo.status_code <= '90'

在这种情况下，限制status_code <= '90'严格限制wo表中的记录数量，从数千万到约15,000条记录。我原本期望查询利用有限的数据集并使用work_order_operations_pk索引（键），但它不是：

Hash Join  (cost=19.93..40.52 rows=207 width=200)'
  Hash Cond: (op.work_order = wo.work_order)'
  ->  Seq Scan on work_order_operations op  (cost=0.00..16.20 rows=620 width=100)'
  ->  Hash  (cost=17.34..17.34 rows=207 width=100)'
        ->  Bitmap Heap Scan on work_order_master wo  (cost=4.75..17.34 rows=207 width=100)'
              Recheck Cond: (status_code <= '90'::text)'
              ->  Bitmap Index Scan on work_order_master_ix1  (cost=0.00..4.70 rows=207 width=0)'
                    Index Cond: (status_code <= '90'::text)'

我有几个问题：

解释计划是否可能与执行计划不一致，而PostgreSQL确实在使用索引？
有没有办法在查询运行后查看PostgreSQL上的实际执行计划？
有没有办法强制数据库调用索引，类似于Oracle中的提示（虽然我不知道是否有这样的提示 - 我知道有一个可以抑制索引的使用）
有没有其他人想让这个查询更快地运行？

关于问题＃1，从表面上看，索引没有被使用，所以我想我知道答案，但我想确定。

- 编辑10/19/15 -

感谢大家的反馈和建议。对于它的价值，这里有一些统计数据和可能的止损。

我创建了一个临时表，表示与未结工单相关的工单操作（状态＆lt; 91）。这使得记录的数量从40,000,000增加到大约160,000。

方法1：标准加入 - 需要283秒

truncate table stage.open_work_order_operations;

insert into stage.open_work_order_operations
SELECT 
  op.*  
FROM 
  stage.work_order_master wo
  join stage.work_order_operations op on
    wo.work_order = op.work_order
WHERE 
  wo.status <= '91' and
  op.complete_date >= '2006-01-01';

方法2：半连接 - 需要242秒：

truncate table stage.open_work_order_operations;

insert into stage.open_work_order_operations
SELECT 
  op.*  
FROM 
  stage.work_order_operations op
where
  exists (
    select null
    from stage.work_order_master wo
    where
      wo.work_order = op.work_order  AND
      wo.status <= '91'
  ) and
  op.complete_date >= '2006-01-01';

方法3：列表内子查询 - 需要216秒

truncate table stage.open_work_order_operations;

insert into stage.open_work_order_operations
SELECT 
  op.*  
FROM 
  stage.work_order_operations op
WHERE 
  op.work_order in (
      select work_order
      from stage.work_order_master 
      where status <= '91') and
  op.complete_date >= '2006-01-01';

这是有趣的一个。如果我将它包装在函数中并将工作单列表视为一个数组，它将在166秒内完成：

CREATE OR REPLACE FUNCTION stage.open_work_order_data()
  RETURNS integer AS
$BODY$
DECLARE
  rowcount integer := 0;
  work_orders text[];
BEGIN

  select array_agg(work_order)
  into work_orders
  from stage.work_order_master
  where status <= '91';

  truncate table stage.open_work_order_operations;

  insert into stage.open_work_order_operations
  select *
  from stage.work_order_operations op
  where op.work_order = any (work_orders)
  and complete_date >= '2006-01-01';

  GET DIAGNOSTICS rowcount = ROW_COUNT;

  return rowcount;
END;
$BODY$
  LANGUAGE plpgsql VOLATILE
  COST 100;

有些人建议运行执行计划。我想这样做，特别是在最后一个，因为我怀疑是使用work_order_operations_pk，这应该会有很大的不同。

此外，表格上还有当前的统计数据。当我运行最初列出的解释计划时，我正在使用一部分数据，只是为了演示我的问题。子集和整个数据集上的行为似乎相同。

Answer 1

因此，您将wo表过滤为15,000行，然后您希望服务器使用其主键在op表中执行15,000次搜索，而不是扫描op表。我做对了吗？

您可以尝试重写查询以遵循您的首选流程，如下所示：

WITH
CTE
AS
(
    SELECT wo.work_order
    FROM stage.work_order_master AS wo
    WHERE wo.status_code <= '90'
)
SELECT T.*
FROM
    CTE
    INNER JOIN LATERAL
    (
        SELECT op.*
        FROM stage.work_order_operations AS op
        WHERE op.work_order = CTE.work_order
    ) AS T ON true

Postgres实现了CTE，因此过滤wo表的第一步应该使用索引work_order_master_ix1。

LATERAL join明确表示对于已过滤的wo表中的每一行，我们希望从op表中找到一行。在这种情况下，它产生与简单INNER JOIN相同的结果，但是这种语法可以提示＆＃34;提示＆＃34;优化器在op表中为CTE中的每一行进行搜索，而不是进行散列连接并扫描op表。

请尝试使用您的数据，并告诉我们您的执行计划。

解释计划无法使用索引

1 个答案: