我的表中有一些text
和一些numeric
列,例如:
dimension_1, dimension_2, counter_1, counter_2
而不是执行查询
SELECT dimension_1, dimension_2, (counter_1, NULLIF(counter_2, 0)) as kpi
from table order by kpi desc nulls last;
我想创建一个函数并执行:
SELECT dimension_1, dimension_2, func(counter_1, counter_2) as kpi
from table order by kpi desc nulls last;
我在Postgres中使用了以下实现:
CREATE FUNCTION kpi_latency_ext_msec(val1 numeric, val2 numeric)
RETURNS numeric AS $func$
BEGIN
RETURN ($1 / NULLIF($2, 0::numeric));
END; $func$
LANGUAGE PLPGSQL SECURITY DEFINER IMMUTABLE;
并获得所需的结果,但性能较慢。
从EXPLAIN ANALYZE
我得到:
第一次查询(使用func):
Sort (cost=800.85..806.75 rows=2358 width=26) (actual time=5.534..5.710 rows=2358 loops=1)
Sort Key: (kpi_latency_ext_msec(external_tcp_handshake_latency_sum, external_tcp_handshake_latency_samples))
Sort Method: quicksort Memory: 281kB
-> Seq Scan on counters_by_cgi_rat (cost=0.00..668.76 rows=2358 width=26) (actual time=0.142..4.233 rows=2358 loops=1)
Filter: (("timestamp" >= '2018-05-10 00:00:00'::timestamp without time zone) AND ("timestamp" < '2018-05-13 00:00:00'::timestamp without time zone) AND (granularity = '1 day'::interval))
Planning time: 0.221 ms
Execution time: 5.881 ms
第二次查询(无功能):
Sort (cost=223.14..229.04 rows=2358 width=26) (actual time=1.933..2.114 rows=2358 loops=1)
Sort Key: ((external_tcp_handshake_latency_sum / NULLIF(external_tcp_handshake_latency_samples, 0::numeric)))
Sort Method: quicksort Memory: 281kB
-> Seq Scan on counters_by_cgi_rat (cost=0.00..91.06 rows=2358 width=26) (actual time=0.010..1.190 rows=2358 loops=1)
Filter: (("timestamp" >= '2018-05-10 00:00:00'::timestamp without time zone) AND ("timestamp" < '2018-05-13 00:00:00'::timestamp without time zone) AND (granularity = '1 day'::interval))
Planning time: 0.139 ms
Execution time: 2.279 ms
不使用ORDER BY
执行查询:
没有功能:
Seq Scan on table (cost=0.00..91.06 rows=2358 width=26) (actual time=0.016..1.223 rows=2358 loops=1)
使用func:
Seq Scan on table (cost=0.00..668.76 rows=2358 width=26) (actual time=0.123..3.518 rows=2358 loops=1)
功能无安全定义
Seq Scan on counters_by_cgi_rat (cost=0.00..668.76 rows=2358 width=26)
(actual time=0.035..3.718 rows=2358 loops=1)
Filter: (("timestamp" >= '2018-05-10 00:00:00'::timestamp without time zone)
AND ("timestamp" < '2018-05-13 00:00:00'::timestamp without time zone)
AND (granularity = '1 day'::interval))
Planning time: 0.086 ms
Execution time: 3.923 ms
结果用于普通查询
Seq Scan on counters_by_cgi_rat (cost=0.00..91.06 rows=2358 width=26)
(actual time=0.017..1.175 rows=2358 loops=1)
Filter: (("timestamp" >= '2018-05-10 00:00:00'::timestamp without time zone)
AND ("timestamp" < '2018-05-13 00:00:00'::timestamp without time zone)
AND (granularity = '1 day'::interval))
Planning time: 0.105 ms
Execution time: 1.356 ms
使用语言sql 结果
Seq Scan on counters_by_cgi_rat (cost=0.00..91.06 rows=2358 width=26)
(actual time=0.011..1.123 rows=2358 loops=1)
Filter: (("timestamp" >= '2018-05-10 00:00:00'::timestamp without time zone)
AND ("timestamp" < '2018-05-13 00:00:00'::timestamp without time zone)
AND (granularity = '1 day'::interval))
Planning time: 0.180 ms
Execution time: 1.294 ms
使用语言sql 快速
可以肯定它比使用语言 plpgsql 更快但比原始查询稍慢(重复运行后)
=========更新=========
CREATE FUNCTION kpi_latency_ext_msec(val1 numeric, val2 numeric)
RETURNS numeric LANGUAGE sql STABLE AS
'SELECT $1 / NULLIF($2, 0)';
使用上述函数获得的最佳结果(甚至比普通查询更快)
答案 0 :(得分:2)
The poison dart is SECURITY DEFINER
. Functions declared SECURITY DEFINER
cannot be inlined - and enforce a context switch if I am not mistaken. That can make them considerably more expensive. There is really no need for SECURITY DEFINER
in the example. You do not need different privileges for the simple calculation. (Maybe your actual use case is different.)
And there is no need for PL/pgSQL either. Only SQL functions can be inlined - if some additional preconditions are met.
Since all used functions are IMMUTABLE
, you should declare the function IMMUTABLE
. (Default function volatility is VOLATILE
.) You already updated the question accordingly. That allows expression indexes and can help prevent repeated evaluation in some situations. But it never helps with function inlining. Au contraire: it imposes more preconditions (which are met in this case). Quoting the Postgres Wiki on function inlining (last update 2016 at the time of writing):
if the function is declared
IMMUTABLE
, then the expression must not invoke any non-immutable function or operator
Quoting Tom Lane on pgsql-performance:
The basic point here is that a function marked volatile can be expanded to its contained functions even if they're immutable; but the other way around represents a potential semantic change, so the planner won't do it.
Try without SECURITY DEFINER
:
CREATE FUNCTION kpi_latency_ext_msec(val1 numeric, val2 numeric)
RETURNS numeric AS
$func$
BEGIN
RETURN $1 / NULLIF($2, numeric '0');
END
$func$ LANGUAGE plpgsql IMMUTABLE;
Should be much faster already.
Or radically simplify to an SQL function:
CREATE FUNCTION f_div0_sql_nullif(val1 numeric, val2 numeric)
RETURNS numeric LANGUAGE sql IMMUTABLE AS
$$SELECT $1 / NULLIF($2, numeric '0')$$;
Faster, yet?
Related:
I used IF
and CASE
expressions at first, but after a_horse_with_no_name's comment I ran extensive tests showing NULLIF
to be slightly faster. So I simplified to the original NULLIF
variant accordingly.
The major points are still no SECURITY DEFINER
, SQL
and IMMUTABLE
.