在SQL查询中执行一次性计算

时间:2013-08-29 07:10:30

标签: postgresql timestamp unix-timestamp epoch compile-time-constant

我有这个查询(为简单起见而编辑):

select to_timestamp(s.sampletimestamp/1000)
from sample s
where s.sampletimestamp >= extract(EPOCH FROM TIMESTAMP WITH TIME ZONE '2013-08-11 00:00:00+10')*1000 and
s.sampletimestamp < extract(EPOCH FROM TIMESTAMP WITH TIME ZONE '2013-08-12 00:00:00+10')*1000
order by s.sampletimestamp;

我注意到通过手动输入时间值可以更快地执行此操作:

select to_timestamp(s.sampletimestamp/1000)
from sample s
where s.sampletimestamp >= 1376143200000 and
s.sampletimestamp < 1376229600000
order by s.sampletimestamp;

其中时间是以毫秒为单位的纪元时间戳。我的猜测是计算机正在评估每个记录的extract(EPOCH...)部分,而它只需要这样做一次。

是否有某种方法可以保留第一个查询的更易读的形式,同时保持查询与第二个查询一样高效?

我是PostgreSQL的新手(并且完全是自学成才的),所以我认为我最痛苦的问题是不知道我应该将哪一个特定的关键字放入谷歌 - 我已经使用过以及PostgreSQL文档。

提前致谢:)

EDIT1:感谢非常详细的回复。我怀疑我可能与大多数响应者处于不同的时区 - 我明天会提供实验证据(这里已经很晚了)。

EDIT2:总结下面的答案,使用'bigint'进行投射可以解决问题。替换:

where s.sampletimestamp >= extract(EPOCH FROM TIMESTAMP WITH TIME ZONE '2013-08-11 00:00:00+10')*1000 and
s.sampletimestamp < extract(EPOCH FROM TIMESTAMP WITH TIME ZONE '2013-08-12 00:00:00+10')*1000

使用:

where s.sampletimestamp >= extract(EPOCH FROM TIMESTAMP WITH TIME ZONE '2013-08-11 00:00:00+10')::bigint*1000 and
s.sampletimestamp < extract(EPOCH FROM TIMESTAMP WITH TIME ZONE '2013-08-12 00:00:00+10')::bigint*1000

1 个答案:

答案 0 :(得分:2)

这里发生的事情是extract是使用date_part函数实现的:

regress=> explain select count(1) from generate_series(1376143200000,1376143200000+1000000) x where x > extract(EPOCH FROM TIMESTAMP WITH TIME ZONE '2013-08-11 00:00:00+10')*1000 and x <  extract(EPOCH FROM TIMESTAMP WITH TIME ZONE '2013-08-12 00:00:00+10')*1000;
                                                                                                                                        QUERY PLAN                                                                                                                                         
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=30.02..30.03 rows=1 width=0)
   ->  Function Scan on generate_series x  (cost=0.00..30.00 rows=5 width=0)
         Filter: (((x)::double precision > (date_part('epoch'::text, '2013-08-10 22:00:00+08'::timestamp with time zone) * 1000::double precision)) AND ((x)::double precision < (date_part('epoch'::text, '2013-08-11 22:00:00+08'::timestamp with time zone) * 1000::double precision)))
(3 rows)

date_part(text, timestamptz)定义为stable而不是immutable

regress=> \df+ date_part
                                                                                                                 List of functions
   Schema   |   Name    | Result data type |        Argument data types        |  Type  | Volatility |  Owner   | Language |                               Source code                                |                 Description                 
------------+-----------+------------------+-----------------------------------+--------+------------+----------+----------+--------------------------------------------------------------------------+---------------------------------------------
 ...
 pg_catalog | date_part | double precision | text, timestamp with time zone    | normal | stable     | postgres | internal | timestamptz_part                                                         | extract field from timestamp with time zone
 ...

我很确定这会阻止Pg预先计算该值并将其内联到调用中。我需要深入挖掘才能确定。

我认为原因是date_part上的timestamptz可能依赖于TimeZone设置的值。 date_part('epoch', some_timestamptz)不是这样,但查询计划程序在计划时不理解您正在使用它。

我仍然感到惊讶,它没有预先计算,因为the documentation状态:

  

STABLE函数无法修改数据库,并且保证在单个语句中为所有行提供相同的参数时返回相同的结果。此类别允许优化器将函数的多个调用优化为单个调用。

您可以先使用AT TIME ZONE 'UTC'转换为UTC时间戳(或任何TZ时代的TZ),以解决这一明显的限制。 E.g:

select count(1) 
from generate_series(1376143200000,1376143200000+1000000) x 
where x > extract(EPOCH FROM TIMESTAMP WITH TIME ZONE '2013-08-11 00:00:00+10' AT TIME ZONE 'UTC')*1000 
and x <  extract(EPOCH FROM TIMESTAMP WITH TIME ZONE '2013-08-12 00:00:00+10' AT TIME ZONE 'UTC')*1000;

执行速度更快,但如果计算一次的时间差异超出我的预期:

regress=> select count(1) from generate_series(1376143200000,1376143200000+1000000) x where x > extract(EPOCH FROM TIMESTAMP WITH TIME ZONE '2013-08-11 00:00:00+10')*1000 and x <  extract(EPOCH FROM TIMESTAMP WITH TIME ZONE '2013-08-12 00:00:00+10')*1000;
  count  
---------
 1000000
(1 row)

Time: 767.629 ms

regress=> select count(1) from generate_series(1376143200000,1376143200000+1000000) x where x > extract(EPOCH FROM TIMESTAMP WITH TIME ZONE '2013-08-11 00:00:00+10' AT TIME ZONE 'UTC')*1000 and x <  extract(EPOCH FROM TIMESTAMP WITH TIME ZONE '2013-08-12 00:00:00+10' AT TIME ZONE 'UTC')*1000;
  count  
---------
 1000000
(1 row)

Time: 373.453 ms

regress=> select count(1) from generate_series(1376143200000,1376143200000+1000000) x where x > 1376143200000 and x <  1376229600000;
  count  
---------
 1000000
(1 row)

Time: 324.557 ms

可以删除此查询优化程序限制/添加功能以优化此功能。优化器可能需要在解析时识别extract('epoch', ...)是一种特殊情况,而不是调用date_part('epoch, ...)来调用一个不可变的特殊timestamptz_epoch(...)函数。

稍微查看perf top结果显示timestamptz案例具有以下峰值:

 10.33%  postgres      [.] ExecMakeFunctionResultNoSets
  7.76%  postgres      [.] timesub.isra.1
  6.94%  postgres      [.] datebsearch
  5.58%  postgres      [.] timestamptz_part
  3.82%  postgres      [.] AllocSetAlloc
  2.97%  postgres      [.] ExecEvalConst
  2.68%  postgres      [.] downcase_truncate_identifier
  2.38%  postgres      [.] ExecEvalScalarVarFast
  2.23%  postgres      [.] slot_getattr
  1.99%  postgres      [.] DatumGetFloat8

使用AT TIME ZONE的时间我们得到:

 11.58%  postgres      [.] ExecMakeFunctionResultNoSets
  4.28%  postgres      [.] AllocSetAlloc
  4.18%  postgres      [.] ExecProject
  3.82%  postgres      [.] slot_getattr
  2.99%  libc-2.17.so  [.] __memmove_ssse3
  2.96%  postgres      [.] BufFileWrite
  2.80%  libc-2.17.so  [.] __memcpy_ssse3_back
  2.74%  postgres      [.] BufFileRead
  2.69%  postgres      [.] float8lt

并使用整数大小写:

  7.92%  postgres      [.] ExecMakeFunctionResultNoSets
  5.36%  postgres      [.] slot_getattr
  4.52%  postgres      [.] AllocSetAlloc
  4.02%  postgres      [.] ExecProject
  3.42%  libc-2.17.so  [.] __memmove_ssse3
  3.33%  postgres      [.] BufFileWrite
  3.31%  libc-2.17.so  [.] __memcpy_ssse3_back
  2.91%  postgres      [.] BufFileRead
  2.90%  postgres      [.] GetMemoryChunkSpace
  2.67%  postgres      [.] AllocSetFree

因此,您可以看到AT TIME ZONE版本可以避免重复的timestamptz_partdatebsearch来电。它与整数大小写的主要区别是float8lt;看起来我们正在进行double precision比较而不是整数比较。

果然,演员会照顾它:

select count(1) 
from generate_series(1376143200000,1376143200000+1000000) x
where x > extract(EPOCH FROM TIMESTAMP WITH TIME ZONE '2013-08-11 00:00:00+10' AT TIME ZONE 'UTC')::bigint * 1000  
and x <  extract(EPOCH FROM TIMESTAMP WITH TIME ZONE '2013-08-12 00:00:00+10' AT TIME ZONE 'UTC')::bigint * 1000;

我目前没有时间对上面讨论的优化器进行增强,但是您可能需要考虑在邮件列表中进行优化。