Question

我的表中有一些text和一些numeric列，例如：

dimension_1, dimension_2, counter_1, counter_2

而不是执行查询

SELECT dimension_1, dimension_2, (counter_1, NULLIF(counter_2, 0)) as kpi 
from table order by kpi desc nulls last;

我想创建一个函数并执行：

SELECT dimension_1, dimension_2, func(counter_1, counter_2) as kpi
from table order by kpi desc nulls last;

我在Postgres中使用了以下实现：

CREATE FUNCTION kpi_latency_ext_msec(val1 numeric, val2 numeric)     
RETURNS numeric AS $func$
BEGIN

RETURN ($1 / NULLIF($2, 0::numeric));             

END; $func$
LANGUAGE PLPGSQL SECURITY DEFINER IMMUTABLE;

并获得所需的结果，但性能较慢。

从EXPLAIN ANALYZE我得到：

第一次查询（使用func）：

Sort  (cost=800.85..806.75 rows=2358 width=26) (actual  time=5.534..5.710 rows=2358 loops=1)
Sort Key: (kpi_latency_ext_msec(external_tcp_handshake_latency_sum, external_tcp_handshake_latency_samples))
Sort Method: quicksort  Memory: 281kB
 ->  Seq Scan on counters_by_cgi_rat  (cost=0.00..668.76 rows=2358 width=26) (actual time=0.142..4.233 rows=2358 loops=1)
Filter: (("timestamp" >= '2018-05-10 00:00:00'::timestamp without time zone) AND ("timestamp" < '2018-05-13 00:00:00'::timestamp without time zone) AND (granularity = '1 day'::interval))
Planning time: 0.221 ms
Execution time: 5.881 ms

第二次查询（无功能）：

Sort  (cost=223.14..229.04 rows=2358 width=26) (actual time=1.933..2.114 rows=2358 loops=1)

Sort Key: ((external_tcp_handshake_latency_sum / NULLIF(external_tcp_handshake_latency_samples, 0::numeric)))
Sort Method: quicksort  Memory: 281kB
->  Seq Scan on counters_by_cgi_rat  (cost=0.00..91.06 rows=2358 width=26) (actual time=0.010..1.190 rows=2358 loops=1)
Filter: (("timestamp" >= '2018-05-10 00:00:00'::timestamp without time zone) AND ("timestamp" < '2018-05-13 00:00:00'::timestamp without time zone) AND (granularity = '1 day'::interval))
Planning time: 0.139 ms
Execution time: 2.279 ms

不使用ORDER BY执行查询：

没有功能：

Seq Scan on table (cost=0.00..91.06 rows=2358 width=26) (actual time=0.016..1.223 rows=2358 loops=1)

使用func：

Seq Scan on table (cost=0.00..668.76 rows=2358 width=26) (actual time=0.123..3.518 rows=2358 loops=1)

功能无安全定义

Seq Scan on counters_by_cgi_rat  (cost=0.00..668.76 rows=2358 width=26) 
                                  (actual time=0.035..3.718 rows=2358 loops=1)
Filter: (("timestamp" >= '2018-05-10 00:00:00'::timestamp without time zone) 
        AND ("timestamp" < '2018-05-13 00:00:00'::timestamp without time zone) 
        AND (granularity = '1 day'::interval))
  Planning time: 0.086 ms
  Execution time: 3.923 ms

结果用于普通查询

Seq Scan on counters_by_cgi_rat  (cost=0.00..91.06 rows=2358 width=26)    
                                 (actual time=0.017..1.175 rows=2358 loops=1)  
Filter: (("timestamp" >= '2018-05-10 00:00:00'::timestamp without time zone) 
AND ("timestamp" < '2018-05-13 00:00:00'::timestamp without time zone)     
AND (granularity = '1 day'::interval))
 Planning time: 0.105 ms
 Execution time: 1.356 ms

使用语言sql

结果

 Seq Scan on counters_by_cgi_rat  (cost=0.00..91.06 rows=2358 width=26)          
                                  (actual time=0.011..1.123 rows=2358 loops=1)
 Filter: (("timestamp" >= '2018-05-10 00:00:00'::timestamp without time zone) 
         AND ("timestamp" < '2018-05-13 00:00:00'::timestamp without time zone) 
         AND (granularity = '1 day'::interval))
 Planning time: 0.180 ms
 Execution time: 1.294 ms

使用语言sql

快速

可以肯定它比使用语言 plpgsql 更快但比原始查询稍慢（重复运行后）

=========更新=========

CREATE FUNCTION kpi_latency_ext_msec(val1 numeric, val2 numeric)
RETURNS numeric LANGUAGE sql STABLE AS
'SELECT $1 / NULLIF($2, 0)';

使用上述函数获得的最佳结果（甚至比普通查询更快）

Answer 1

The poison dart is SECURITY DEFINER. Functions declared SECURITY DEFINER cannot be inlined - and enforce a context switch if I am not mistaken. That can make them considerably more expensive. There is really no need for SECURITY DEFINER in the example. You do not need different privileges for the simple calculation. (Maybe your actual use case is different.)

And there is no need for PL/pgSQL either. Only SQL functions can be inlined - if some additional preconditions are met.

Since all used functions are IMMUTABLE, you should declare the function IMMUTABLE. (Default function volatility is VOLATILE.) You already updated the question accordingly. That allows expression indexes and can help prevent repeated evaluation in some situations. But it never helps with function inlining. Au contraire: it imposes more preconditions (which are met in this case). Quoting the Postgres Wiki on function inlining (last update 2016 at the time of writing):

if the function is declared IMMUTABLE, then the expression must not invoke any non-immutable function or operator

Quoting Tom Lane on pgsql-performance:

The basic point here is that a function marked volatile can be expanded to its contained functions even if they're immutable; but the other way around represents a potential semantic change, so the planner won't do it.

Solution

Try without SECURITY DEFINER:

CREATE FUNCTION kpi_latency_ext_msec(val1 numeric, val2 numeric)     
  RETURNS numeric AS
$func$
BEGIN
   RETURN $1 / NULLIF($2, numeric '0');
END
$func$  LANGUAGE plpgsql IMMUTABLE;

Should be much faster already.

Or radically simplify to an SQL function:

CREATE FUNCTION f_div0_sql_nullif(val1 numeric, val2 numeric)     
  RETURNS numeric LANGUAGE sql IMMUTABLE AS
$$SELECT $1 / NULLIF($2, numeric '0')$$;

Faster, yet?

How do IMMUTABLE, STABLE and VOLATILE keywords effect behaviour of function?

Benchmark

I used IF and CASE expressions at first, but after a_horse_with_no_name's comment I ran extensive tests showing NULLIF to be slightly faster. So I simplified to the original NULLIF variant accordingly.

_{The major points are still no SECURITY DEFINER, SQL and IMMUTABLE.}

db<>fiddle here - pg 10
db<>fiddle here - pg 9.4

嵌套plpgsql函数中的计算比直接查询慢吗？

1 个答案:

Solution

Benchmark