我有一个postgres 9.6表,在'price'表中有数亿条记录,只有四列:uid,price,unit,dt。 dt是标准格式的日期时间,如'2017-05-01 00:00:00.585',只有几分之一秒。每秒可能没有几十个记录。
我可以通过秒获得它。
SELECT uid, bid, ask, dt,
CASE
WHEN rn1 = 1 THEN 'First'
WHEN rn2 = 1 THEN 'Last'
ELSE 'Somwhere in the middle'
END as Which_row_within_a_second
FROM (
select *,
row_number() over( partition by date_trunc('second', dt)
order by dt
) rn1,
row_number() over( partition by date_trunc('second', dt)
order by dt DESC
) rn2
from prices
where instrument = 'xxxxxx'
AND dt >= '2017-05-01 00:00:00'
AND dt < '2017-05-02 00:00:00'
) xx
WHERE
1 IN (rn1, rn2 )
ORDER BY dt
;
但是,我需要在任何时期,例如5秒,1小时,2小时30秒,1天等等:
uid bid ask which_row_within_a_second
4ecaa607-3733-4aba-9093-abc8f59e1638 0.84331 0.8434 2017-05-01 00:00:00.031 First
cf6d5341-f7fd-47bc-89f6-a5448f78fb99 0.84329 0.84339 2017-05-01 00:00:00.943 Last
6dbf8d8e-37c8-4537-80b5-c9219f4356b1 0.8433 0.84339 2017-05-01 00:00:05.079 First
f9937464-e36a-4c57-a212-2f32943307d3 0.8433 0.84338 2017-05-01 00:00:05.83 Last
注意dt列:间隔为5秒
此外它有点慢,我正在寻求性能提升,如果可能的话。 “工具”上有索引,并在“工具,dt,买入价,卖出价”上合并,并合并为“dt,bid,ask”。
有什么想法吗?
答案 0 :(得分:0)
您必须根据用户输入动态创建范围,因此,如果您的范围为5 seconds
:
WITH ranges as (
SELECT dd as start_range,
dd + '5 seconds'::interval as end_range,
ROW_NUMBER() over () as grp
FROM generate_series
( '2017-05-01 00:00:00'::timestamp
, '2017-05-02 00:00:00'::timestamp
, '5 seconds'::interval) dd
), create_grp as (
SELECT r.grp, r.start_range, r.end_range, p.*
FROM prices p
JOIN ranges r
ON p.date >= r.start_range
AND p.date < r.end_range
), minmax as (
SELECT row_number() over (partition by grp
order by dt asc) as rn1,
row_number() over (partition by grp
order by dt desc) as rn2,
create_grp.*
FROM create_grp
)
SELECT uid, bid, ask, dt,
CASE WHEN rn1 = 1 and rn2 = 1 THEN 'first and last'
WHEN rn1 = 1 THEN 'first'
WHEN rn2 = 1 THEN 'last'
END as row_position
FROM minmax
WHERE 1 IN (rn1, rn2)
当组范围只有一行时,会出现一种特殊情况。