我有一张简单的表格:
ID--- CreateDate --- Value
1 --- 2015-09-25 10:00 --- 1
1 --- 2015-09-25 10:30 --- 2
1 --- 2015-09-25 11:00 --- 3
1 --- 2015-09-25 11:30 --- 2
1 --- 2015-09-25 12:00 --- 1
2 --- 2015-09-25 10:00 --- 2
2 --- 2015-09-25 10:30 --- 3
2 --- 2015-09-25 11:00 --- 3
2 --- 2015-09-25 11:30 --- 3
2 --- 2015-09-25 12:00 --- 2
我需要在IE 24小时的特定时间范围内,在数据中找到值大于2的序列,持续1½小时或更长时间。如果我可以确定每个测量点(来自每个ID的行)恰好以30分钟的间隔出现,那么这不会成为问题。
然后,使用窗口函数,我的结果看起来像
2 --- 2015-09-25 10:30 --- 3
2 --- 2015-09-25 11:00 --- 3
2 --- 2015-09-25 11:30 --- 3
问题是 - 某些ID会产生两倍的行(有些甚至更多)。像这样:
1 --- 2015-09-25 10:00 --- 1
1 --- 2015-09-25 10:30 --- 3
1 --- 2015-09-25 11:00 --- 3
1 --- 2015-09-25 11:30 --- 3
1 --- 2015-09-25 12:00 --- 1
2 --- 2015-09-25 10:00 --- 1
2 --- 2015-09-25 10:15 --- 2
2 --- 2015-09-25 10:30 --- 3
2 --- 2015-09-25 10:45 --- 3
2 --- 2015-09-25 11:00 --- 3
2 --- 2015-09-25 11:15 --- 3
2 --- 2015-09-25 11:30 --- 3
2 --- 2015-09-25 11:45 --- 2
2 --- 2015-09-25 12:00 --- 2
在这种情况下,我希望我的结果看起来像这样:
1 --- 2015-09-25 10:30 --- 3
1 --- 2015-09-25 11:00 --- 3
1 --- 2015-09-25 11:30 --- 3
2 --- 2015-09-25 10:30 --- 3
2 --- 2015-09-25 10:45 --- 3
2 --- 2015-09-25 11:00 --- 3
2 --- 2015-09-25 11:15 --- 3
2 --- 2015-09-25 11:30 --- 3
但是,据我所知,窗口函数不支持基于时间/列的参数。那么,当我无法“计算”特定数量的行时,我的替代方案是什么?
我对使用SQL服务器,表结构,任何东西的替代方案的建议持开放态度。)
答案 0 :(得分:1)
这是一种间隙和岛屿问题。您需要识别具有大于2的相邻序列的组。一种方法是使用行号的差异:
select t.*
from (select t.*, max(value) over (partition by id, grp) as maxvalue,
count(*) over (partition by id, grp) as cnt
from (select t.*,
(row_number() over (partition by id order by createdate) -
row_number() over (partition by id, (case when value > 2 then 1 else 0 end)
order by createdate)
) as grp
from table t
) t
) t
where cnt >= 3 and maxvalue > 2
然后查询计算每个组中的数字和值,选择值较大且序列长度至少为3的值。注意:您可以获得最小和最大时间,并检查差异是否至少1.5小时。但是,长度为3的序列似乎符合您的条件。