查找时间序列的运行总和超过给定阈值的位置

时间:2016-01-15 16:14:46

标签: sql oracle oracle11g

我有一些时间序列数据。例如,查看以下值(假设这里的时间是几分钟):

User Time Value
a   0      10
b   1      100
c   2      200
a   3      5
e   4      7
a   5      999
a   6      8
b   7      10
a   8      10
a   9      10
a   10     10
a   11     10
a   12     100

现在我想知道在任何给定的5分钟间隔内是否达到了超过1000的总和。

例如,在上面的示例中,我应该得到一个输出,例如用户a,分钟5,6,8,9。

3 个答案:

答案 0 :(得分:5)

对于Window Function来说,这是一项简单的任务:

select *
from 
 (
   select t.*
     ,sum("Value") -- cumulative sum over the previous five minutes
      over (partition by "user"
            order by "Time"
            range 4 preceding) as sum_5_minutes
   from Table1 t
 ) dt
where sum_5_minutes > 1000

请参阅fiddle

编辑:SQLFiddle再次脱机,但您也可以在接下来的5分钟内搜索。

Edit2:SQLFiddle离线,但如果数据类型是TimeStampDate,则必须使用间隔而不是整数:

select *
from 
 (
  select t.*
     ,sum("Value") 
      over (partition by "User"
            order by "Time"
            range interval '4' minute preceding) as sum_prev5_minutes
      ,sum("Value") 
      over (partition by "User"
            order by "Time"
            range between interval '0' minute preceding -- or "current row" if there are no duplicate timestamps
            and interval '4' minute following) as sum_next5_minutes

   from Table1 t
 ) dt
where sum_prev5_minutes > 1000 
   or sum_next5_minutes > 1000

答案 1 :(得分:2)

为了说明我对dnoeth的帖子的评论,所以不要将我的答案视为正确,因为他做了繁重的工作,并且应该得到绿色的复选标记,下面显示了如何在运行时设置范围...

WITH DAT AS (
SELECT 'a' u,   0 t,     10 v from dual union all
SELECT 'b' u,   1 t,       100 v from dual union all
SELECT 'c' u,   2 t,       200 v from dual union all
SELECT 'a' u,   3 t,       5 v from dual union all
SELECT 'e' u,   4 t,       7 v from dual union all
SELECT 'a' u,   5 t,       999 v from dual union all
SELECT 'a' u,   6 t,       8 v from dual union all
SELECT 'b' u,   7 t,       10 v from dual union all
SELECT 'a' u,   8  t,      10 v from dual union all
SELECT 'a' u,   9 t,       10 v from dual union all
SELECT 'a' u,   10 t,      10 v from dual union all
SELECT 'a' u,   11 t,      10 v from dual union all
SELECT 'a' u,   12 t,      100 v from dual )
  -- imaging passing a variable in to this second query, setting it in a config table, or whatever. 
  -- This is just showing that you don't have to hard-code it into the actual select clause, and that the value can be determined at runtime.
, wind as (select 5 rng from dual)
select d.*
     ,sum(v) -- cumulative sum over the previous five minutes
      over (partition by u order by t
            range w.rng preceding) as sum_5_minutes
   from dat d
      join wind w on 1=1
   order by u,t;

我还注意到lad2025是正确的,这个窗口会遗漏集合中的某些行。要纠正这一点,您需要在前五秒超过1000的用户的范围内恢复集合中的所有行。这适用于下面的用户Z,但只会将第二行带回原始编码。

WITH DAT AS (
SELECT 'a' u,   0 t,     10 v from dual union all
SELECT 'b' u,   1 t,       100 v from dual union all
SELECT 'c' u,   2 t,       200 v from dual union all
SELECT 'a' u,   3 t,       5 v from dual union all
SELECT 'e' u,   4 t,       7 v from dual union all
SELECT 'a' u,   5 t,       999 v from dual union all
SELECT 'a' u,   6 t,       8 v from dual union all
SELECT 'b' u,   7 t,       10 v from dual union all
SELECT 'a' u,   8  t,      10 v from dual union all
SELECT 'a' u,   9 t,       10 v from dual union all
SELECT 'a' u,   10 t,      10 v from dual union all
SELECT 'a' u,   11 t,      10 v from dual union all
-- two Z rows added. In the initial version only the second row would be caught.
SELECT 'z' u,   10 t,      999 v from dual union all
SELECT 'z' u,   11 t,      10 v from dual union all
SELECT 'a' u,   12 t,      100 v from dual )
, wind as (select 3 rng from dual)
SELECT dd.*, sum_5_minutes
from dat dd
JOIN (
  SELECT * FROM ( 
        select d.*
             ,sum(v) -- cumulative sum over the previous five minutes
              over (partition by u order by t
                    range w.rng preceding) as sum_5_minutes
             ,min(t) -- start point of the range that we are covering
              over (partition by u order by t
                    range w.rng preceding) as rng_5_minutes
           from dat d
              join wind w on 1=1
   ) WHERE    sum_5_minutes > 1000 ) fails
on dd.u = fails.u
and dd.t >= fails.rng_5_minutes
and dd.t <= fails.t           
order by dd.u, dd.t;

答案 2 :(得分:1)

以下是我的尝试:

select
  s1."user", s1."time", sum (s2."value") as five_minute_value
from
  sample s1
  left join sample s2 on
    s1."user" = s2."user" and
    s1."time" between s2."time" and s2."time" + 4
group by
  s1."user", s1."time"
having
  sum (s2."value") > 1000

您的数据输出:

a   8   1017
a   9   1027
a   6   1012
a   5   1004