为什么Postgres SQL函数会扫描不应该分区的分区

时间:2017-09-21 09:17:10

标签: postgresql partitioning

我偶然发现了我的SQL函数非常奇怪的问题。它们似乎在函数language SQLlanguage plpgsql之间有不同的执行计划,但我无法确定为SQL版本设置的执行计划,因为它需要:Function's final statement must be SELECT or INSERT/UPDATE/DELETE RETURNING.并赢得'让我使用EXPLAIN

至于为什么我知道他们有不同的计划,这是因为SQL版本无法执行,抱怨它无法连接到当前被删除的一个外部服务器。使用外部表进行连接,并且该表按日期(列date_col)进行分区,其中一些分区在物理上位于同一服务器上,一些分区位于外部。函数中使用的日期参数确保它只应扫描一个分区,并且该分区位于同一服务器上。这也显示在explain上使用的plain SQL上(不在函数中):

Append  (cost=2.77..39.52 rows=2 width=36)
  CTE ct
    ->  Result  (cost=0.00..0.51 rows=100 width=4)
  InitPlan 2 (returns $1)
    ->  Aggregate  (cost=2.25..2.26 rows=1 width=32)
          ->  CTE Scan on ct  (cost=0.00..2.00 rows=100 width=4)
  ->  Seq Scan on table1  (cost=0.00..0.00 rows=1 width=36)
        Filter: ((date_col = '2017-07-30'::date) AND (some_col = ANY ($1)))
  ->  Seq Scan on "part$_table1_201707"  (cost=0.00..36.75 rows=1 width=36)
        Filter: ((date_col = '2017-07-30'::date) AND (some_col = ANY ($1)))

外部分区是在2017年之前,它表明规划者选择了正确的分区,并且不打扰扫描任何其他分区。这适用于plain SQLplpgsql function,但不适用于sql function。为什么会这样,如果不重写我的功能,我可以避免它吗?

根据我的想法,参数在SQL function中的传递方式之间必定存在一些差异,因为其中的硬编码日期会阻止查询扫描不必要的分区。也许会发生这样的事情:

WITH ct AS (SELECT unnest(array[1,2]) AS arr)
  SELECT col1, col2
    FROM table1
   WHERE date_col = (SELECT '2017-07-30'::date)
     AND some_col = ANY((SELECT array_agg(arr) FROM ct)::int[])

制作此类EXPLAIN

Append  (cost=2.78..183.67 rows=3 width=36)
  CTE ct
    ->  Result  (cost=0.00..0.51 rows=100 width=4)
  InitPlan 2 (returns $1)
    ->  Result  (cost=0.00..0.01 rows=1 width=4)
  InitPlan 3 (returns $2)
    ->  Aggregate  (cost=2.25..2.26 rows=1 width=32)
          ->  CTE Scan on ct  (cost=0.00..2.00 rows=100 width=4)
  ->  Seq Scan on table1  (cost=0.00..0.00 rows=1 width=36)
        Filter: ((date_col = $1) AND (some_col = ANY ($2)))
  ->  Seq Scan on "part$_table1_201707"  (cost=0.00..36.75 rows=1 width=36)
        Filter: ((date_col = $1) AND (some_col = ANY ($2)))
  ->  Foreign Scan on "part$_table1_201603"  (cost=100.00..144.14 rows=1 width=36)

作为参考,您可以使用以下代码在PostgreSQL 9.6.4上重现问题:

CREATE SERVER broken_server FOREIGN DATA WRAPPER postgres_fdw
OPTIONS (host 'broken_server', dbname 'postgres',
         port '5432');

CREATE USER MAPPING FOR postgres SERVER broken_server 
OPTIONS (user 'foreign_username', password 'foreign_password');

CREATE TABLE table1 (id serial PRIMARY KEY, date_col date,
                     some_col int, col1 int, col2 text);

CREATE TABLE part$_table1_201707 ()
INHERITS (table1);
ALTER TABLE part$_table1_201707 ADD CONSTRAINT part$_table1_201707_date_chk
        CHECK (date_col BETWEEN '2017-07-01'::date AND '2017-07-31'::date);

CREATE FOREIGN TABLE part$_table1_201603 ()
INHERITS (table1) SERVER broken_server
OPTIONS (schema_name 'public', table_name 'part$_table1_201603');
ALTER TABLE part$_table1_201603 ADD CONSTRAINT part$_table1_201603_date_chk
        CHECK (date_col BETWEEN '2016-03-01'::date AND '2016-03-31'::date);

CREATE OR REPLACE FUNCTION function_plpgsql(param1 date, param2 int[])
 RETURNS TABLE(col1 int, col2 text)
 LANGUAGE plpgsql
 SECURITY DEFINER
AS $function$
BEGIN
  --
  RETURN QUERY
  WITH ct AS (SELECT unnest(param2) AS arr)
  SELECT t.col1, t.col2
    FROM table1 AS t
   WHERE date_col = param1
     AND some_col = ANY((SELECT array_agg(arr) FROM ct)::int[]); --reasons
  --
END;
$function$;

CREATE OR REPLACE FUNCTION function_sql(param1 date, param2 int[])
 RETURNS TABLE(col1 int, col2 text)
 LANGUAGE SQL
 SECURITY DEFINER
AS $function$
  --
  WITH ct AS (SELECT unnest(param2) AS arr)
  SELECT t.col1, t.col2
    FROM table1 AS t
   WHERE date_col = param1
     AND some_col = ANY((SELECT array_agg(arr) FROM ct)::int[])
  --
$function$;

CREATE OR REPLACE FUNCTION function_sql_hardcoded(param1 date, param2 int[])
 RETURNS TABLE(col1 int, col2 text)
 LANGUAGE SQL
 SECURITY DEFINER
AS $function$
  --
  WITH ct AS (SELECT unnest(param2) AS arr)
  SELECT t.col1, t.col2
    FROM table1 AS t
   WHERE date_col = '2017-07-30'::date
     AND some_col = ANY((SELECT array_agg(arr) FROM ct)::int[])
  --
$function$;

EXPLAIN ANALYZE
SELECT * FROM function_sql('2017-07-30'::date, array[1,2]);
-- ERROR: could not connect to server "broken_server"

EXPLAIN ANALYZE
SELECT * FROM function_plpgsql('2017-07-30'::date, array[1,2]);
--works

EXPLAIN ANALYZE
SELECT * FROM function_sql_hardcoded('2017-07-30'::date, array[1,2]);
--works, but useless

1 个答案:

答案 0 :(得分:0)

https://www.postgresql.org/docs/current/static/ddl-partitioning.html

  

约束排除仅在查询的WHERE子句包含时才有效   常量(或外部提供的参数)。例如,a   与诸如CURRENT_TIMESTAMP之类的非不可变函数进行比较   无法优化,因为规划者无法知道哪个分区了   函数值可能会在运行时落入。

这将解释扫描不必要的分区 - plpgsql在将其提供给我认为的optimyzer之前处理查询,并且带有常量的sql函数应该可以工作。以及我猜的准备好的陈述。但是将属性值与函数参数进行比较可能不是合适的情况:)