如何在postgres的选定时间段内查询每5秒(或任何其他用户定义的时间段)的第一个和最后一个记录

时间:2017-09-21 17:03:58

标签: postgresql

我有一个postgres 9.6表,在'price'表中有数亿条记录,只有四列:uid,price,unit,dt。 dt是标准格式的日期时间,如'2017-05-01 00:00:00.585',只有几分之一秒。每秒可能没有几十个记录。

我可以通过秒获得它。

SELECT uid, bid, ask, dt,
       CASE
           WHEN rn1 = 1 THEN 'First'
           WHEN rn2 = 1 THEN 'Last'
           ELSE 'Somwhere in the middle'
        END as Which_row_within_a_second
FROM (
   select *,
       row_number() over( partition by date_trunc('second', dt)
                          order by dt
       ) rn1,
       row_number() over( partition by date_trunc('second', dt)
                          order by dt DESC
       ) rn2       
   from prices
   where instrument = 'xxxxxx' 
         AND dt >= '2017-05-01 00:00:00'
         AND dt < '2017-05-02 00:00:00'
) xx
WHERE
    1 IN (rn1, rn2 )
ORDER BY dt
;

但是,我需要在任何时期,例如5秒,1小时,2小时30秒,1天等等:

uid                                     bid     ask                             which_row_within_a_second
4ecaa607-3733-4aba-9093-abc8f59e1638    0.84331 0.8434  2017-05-01 00:00:00.031 First
cf6d5341-f7fd-47bc-89f6-a5448f78fb99    0.84329 0.84339 2017-05-01 00:00:00.943 Last
6dbf8d8e-37c8-4537-80b5-c9219f4356b1    0.8433  0.84339 2017-05-01 00:00:05.079 First
f9937464-e36a-4c57-a212-2f32943307d3    0.8433  0.84338 2017-05-01 00:00:05.83  Last

注意dt列:间隔为5秒

此外它有点慢,我正在寻求性能提升,如果可能的话。 “工具”上有索引,并在“工具,dt,买入价,卖出价”上合并,并合并为“dt,bid,ask”。

有什么想法吗?

1 个答案:

答案 0 :(得分:0)

您必须根据用户输入动态创建范围,因此,如果您的范围为5 seconds

WITH ranges as (
    SELECT dd as start_range, 
           dd + '5 seconds'::interval as end_range, 
           ROW_NUMBER() over () as grp
    FROM generate_series
            ( '2017-05-01 00:00:00'::timestamp 
            , '2017-05-02 00:00:00'::timestamp
            , '5 seconds'::interval) dd
), create_grp as (
    SELECT r.grp, r.start_range, r.end_range, p.*
    FROM prices p
    JOIN ranges r
      ON p.date >= r.start_range
     AND p.date < r.end_range
), minmax as ( 
   SELECT row_number() over (partition by grp
                             order by dt asc) as rn1,
          row_number() over (partition by grp
                             order by dt desc) as rn2,              
          create_grp.*
   FROM create_grp 
)
SELECT uid, bid, ask, dt,
       CASE WHEN rn1 = 1 and rn2 = 1 THEN 'first and last'
            WHEN rn1 = 1 THEN 'first'
            WHEN rn2 = 1 THEN 'last'
       END as row_position
FROM minmax
WHERE 1 IN (rn1, rn2)

当组范围只有一行时,会出现一种特殊情况。