WHERE子句用于不选择时间戳为50ms的行?

时间:2017-08-30 23:58:04

标签: postgresql timestamp where lag lead

我有这样一个表的一部分:

 timestamp                  | Source
----------------------------+----------
 2017-07-28 14:20:28.757464 | Stream
 2017-07-28 14:20:28.775248 | Poll
 2017-07-28 14:20:29.777678 | Poll
 2017-07-28 14:21:28.582532 | Stream

我想实现这个目标:

 timestamp                  | Source
----------------------------+----------
 2017-07-28 14:20:28.757464 | Stream
 2017-07-28 14:20:29.777678 | Poll
 2017-07-28 14:21:28.582532 | Stream

如果原始表格中的第二行已被删除,因为它在时间戳之前或之后的50毫秒内。重要的是,只有在Source =' Poll'。

时才会删除行

不确定如何使用WHERE子句实现这一点?

提前感谢您的帮助。

1 个答案:

答案 0 :(得分:0)

无论我们做什么,我们都可以将其限制为Pools,然后将这些行与Streams联合起来。

with 
streams as (
 select *
 from test 
 where Source = 'Stream'  
),
pools as (
  ...
)

(select * from pools) union (select * from streams) order by timestamp

要获得游泳池,有不同的选择:

相关子查询

对于每一行,我们运行额外查询以获取具有相同源的前一行,然后仅选择那些没有先前时间戳(第一行)或前一时间戳超过50ms的行。

with 
...
pools_with_prev as (
  -- use correlated subquery
  select 
    timestamp, Source, 
    timestamp - interval '00:00:00.05' 
      as timestamp_prev_limit,
    (select max(t2.timestamp)from test as t2 
      where t2.timestamp < test.timestamp and
     t2.Source = test.Source) 
      as timestamp_prev
  from test
),
pools as (
  select timestamp, Source
  from pools_with_prev
  -- then select rows which are >50ms apart
  where timestamp_prev is NULL or
  timestamp_prev < timestamp_prev_limit
)

...

https://www.db-fiddle.com/f/iVgSkvTVpqjNZ5F5RZVSd2/2

加入两个滑动表

相反,为每一行运行子查询,我们只需创建一个表的副本并将其滑动,以便每个Pool行与相同源类型的前一行连接。

with 
...
pools_rn as (
 -- add extra row number column
 -- rows: 1, 2, 3
 select *,
  row_number() over (order by timestamp) as rn
 from test
 where Source = 'Pool'  
),
pools_rn_prev as (
 -- add extra row number column increased by one
 -- like sliding a copy of the table one row down
 -- rows: 2, 3, 4
 select timestamp as timestamp_prev,
  row_number() over (order by timestamp)+1 as rn
 from test
 where Source = 'Pool'  
),
pools as (
 -- now join prev two tables on this column
 -- each row will join with its predecessor
 select timestamp, source 
 from pools_rn
  left outer join pools_rn_prev
  on pools_rn.rn = pools_rn_prev.rn
 where 
  -- then select rows which are >50ms apart
  timestamp_prev is null or
  timestamp - interval '00:00:00.05' > timestamp_prev
)

...

https://www.db-fiddle.com/f/gXmSxbqkrxpvksE8Q4ogEU/2

滑动窗口

现代SQL可以做类似的事情,按源分区,然后使用滑动窗口与前一行连接。

with 
...
pools_with_prev as (
  -- use sliding window to join prev timestamp
  select *, 
    timestamp - interval '00:00:00.05' 
      as timestamp_prev_limit,
    lag(timestamp) over(
      partition by Source order by timestamp
    ) as timestamp_prev
  from test
),
pools as (
  select timestamp, Source
  from pools_with_prev
  -- then select rows which are >50ms apart
  where timestamp_prev is NULL or
  timestamp_prev < timestamp_prev_limit
)


...

https://www.db-fiddle.com/f/8KfTyqRBU62SFSoiZfpu6Q/1

我相信这是最优化的。