我有这样一个表的一部分:
timestamp | Source
----------------------------+----------
2017-07-28 14:20:28.757464 | Stream
2017-07-28 14:20:28.775248 | Poll
2017-07-28 14:20:29.777678 | Poll
2017-07-28 14:21:28.582532 | Stream
我想实现这个目标:
timestamp | Source
----------------------------+----------
2017-07-28 14:20:28.757464 | Stream
2017-07-28 14:20:29.777678 | Poll
2017-07-28 14:21:28.582532 | Stream
如果原始表格中的第二行已被删除,因为它在时间戳之前或之后的50毫秒内。重要的是,只有在Source =' Poll'。
时才会删除行不确定如何使用WHERE子句实现这一点?
提前感谢您的帮助。
答案 0 :(得分:0)
无论我们做什么,我们都可以将其限制为Pools,然后将这些行与Streams联合起来。
with
streams as (
select *
from test
where Source = 'Stream'
),
pools as (
...
)
(select * from pools) union (select * from streams) order by timestamp
要获得游泳池,有不同的选择:
对于每一行,我们运行额外查询以获取具有相同源的前一行,然后仅选择那些没有先前时间戳(第一行)或前一时间戳超过50ms的行。
with
...
pools_with_prev as (
-- use correlated subquery
select
timestamp, Source,
timestamp - interval '00:00:00.05'
as timestamp_prev_limit,
(select max(t2.timestamp)from test as t2
where t2.timestamp < test.timestamp and
t2.Source = test.Source)
as timestamp_prev
from test
),
pools as (
select timestamp, Source
from pools_with_prev
-- then select rows which are >50ms apart
where timestamp_prev is NULL or
timestamp_prev < timestamp_prev_limit
)
...
https://www.db-fiddle.com/f/iVgSkvTVpqjNZ5F5RZVSd2/2
相反,为每一行运行子查询,我们只需创建一个表的副本并将其滑动,以便每个Pool行与相同源类型的前一行连接。
with
...
pools_rn as (
-- add extra row number column
-- rows: 1, 2, 3
select *,
row_number() over (order by timestamp) as rn
from test
where Source = 'Pool'
),
pools_rn_prev as (
-- add extra row number column increased by one
-- like sliding a copy of the table one row down
-- rows: 2, 3, 4
select timestamp as timestamp_prev,
row_number() over (order by timestamp)+1 as rn
from test
where Source = 'Pool'
),
pools as (
-- now join prev two tables on this column
-- each row will join with its predecessor
select timestamp, source
from pools_rn
left outer join pools_rn_prev
on pools_rn.rn = pools_rn_prev.rn
where
-- then select rows which are >50ms apart
timestamp_prev is null or
timestamp - interval '00:00:00.05' > timestamp_prev
)
...
https://www.db-fiddle.com/f/gXmSxbqkrxpvksE8Q4ogEU/2
现代SQL可以做类似的事情,按源分区,然后使用滑动窗口与前一行连接。
with
...
pools_with_prev as (
-- use sliding window to join prev timestamp
select *,
timestamp - interval '00:00:00.05'
as timestamp_prev_limit,
lag(timestamp) over(
partition by Source order by timestamp
) as timestamp_prev
from test
),
pools as (
select timestamp, Source
from pools_with_prev
-- then select rows which are >50ms apart
where timestamp_prev is NULL or
timestamp_prev < timestamp_prev_limit
)
...
https://www.db-fiddle.com/f/8KfTyqRBU62SFSoiZfpu6Q/1
我相信这是最优化的。