我有一些时间序列数据。例如,查看以下值(假设这里的时间是几分钟):
User Time Value
a 0 10
b 1 100
c 2 200
a 3 5
e 4 7
a 5 999
a 6 8
b 7 10
a 8 10
a 9 10
a 10 10
a 11 10
a 12 100
现在我想知道在任何给定的5分钟间隔内是否达到了超过1000的总和。
例如,在上面的示例中,我应该得到一个输出,例如用户a,分钟5,6,8,9。
答案 0 :(得分:5)
对于Window Function来说,这是一项简单的任务:
select *
from
(
select t.*
,sum("Value") -- cumulative sum over the previous five minutes
over (partition by "user"
order by "Time"
range 4 preceding) as sum_5_minutes
from Table1 t
) dt
where sum_5_minutes > 1000
请参阅fiddle
编辑:SQLFiddle再次脱机,但您也可以在接下来的5分钟内搜索。
Edit2:SQLFiddle离线,但如果数据类型是TimeStamp
或Date
,则必须使用间隔而不是整数:
select *
from
(
select t.*
,sum("Value")
over (partition by "User"
order by "Time"
range interval '4' minute preceding) as sum_prev5_minutes
,sum("Value")
over (partition by "User"
order by "Time"
range between interval '0' minute preceding -- or "current row" if there are no duplicate timestamps
and interval '4' minute following) as sum_next5_minutes
from Table1 t
) dt
where sum_prev5_minutes > 1000
or sum_next5_minutes > 1000
答案 1 :(得分:2)
为了说明我对dnoeth的帖子的评论,所以不要将我的答案视为正确,因为他做了繁重的工作,并且应该得到绿色的复选标记,下面显示了如何在运行时设置范围...
WITH DAT AS (
SELECT 'a' u, 0 t, 10 v from dual union all
SELECT 'b' u, 1 t, 100 v from dual union all
SELECT 'c' u, 2 t, 200 v from dual union all
SELECT 'a' u, 3 t, 5 v from dual union all
SELECT 'e' u, 4 t, 7 v from dual union all
SELECT 'a' u, 5 t, 999 v from dual union all
SELECT 'a' u, 6 t, 8 v from dual union all
SELECT 'b' u, 7 t, 10 v from dual union all
SELECT 'a' u, 8 t, 10 v from dual union all
SELECT 'a' u, 9 t, 10 v from dual union all
SELECT 'a' u, 10 t, 10 v from dual union all
SELECT 'a' u, 11 t, 10 v from dual union all
SELECT 'a' u, 12 t, 100 v from dual )
-- imaging passing a variable in to this second query, setting it in a config table, or whatever.
-- This is just showing that you don't have to hard-code it into the actual select clause, and that the value can be determined at runtime.
, wind as (select 5 rng from dual)
select d.*
,sum(v) -- cumulative sum over the previous five minutes
over (partition by u order by t
range w.rng preceding) as sum_5_minutes
from dat d
join wind w on 1=1
order by u,t;
我还注意到lad2025是正确的,这个窗口会遗漏集合中的某些行。要纠正这一点,您需要在前五秒超过1000的用户的范围内恢复集合中的所有行。这适用于下面的用户Z,但只会将第二行带回原始编码。
WITH DAT AS (
SELECT 'a' u, 0 t, 10 v from dual union all
SELECT 'b' u, 1 t, 100 v from dual union all
SELECT 'c' u, 2 t, 200 v from dual union all
SELECT 'a' u, 3 t, 5 v from dual union all
SELECT 'e' u, 4 t, 7 v from dual union all
SELECT 'a' u, 5 t, 999 v from dual union all
SELECT 'a' u, 6 t, 8 v from dual union all
SELECT 'b' u, 7 t, 10 v from dual union all
SELECT 'a' u, 8 t, 10 v from dual union all
SELECT 'a' u, 9 t, 10 v from dual union all
SELECT 'a' u, 10 t, 10 v from dual union all
SELECT 'a' u, 11 t, 10 v from dual union all
-- two Z rows added. In the initial version only the second row would be caught.
SELECT 'z' u, 10 t, 999 v from dual union all
SELECT 'z' u, 11 t, 10 v from dual union all
SELECT 'a' u, 12 t, 100 v from dual )
, wind as (select 3 rng from dual)
SELECT dd.*, sum_5_minutes
from dat dd
JOIN (
SELECT * FROM (
select d.*
,sum(v) -- cumulative sum over the previous five minutes
over (partition by u order by t
range w.rng preceding) as sum_5_minutes
,min(t) -- start point of the range that we are covering
over (partition by u order by t
range w.rng preceding) as rng_5_minutes
from dat d
join wind w on 1=1
) WHERE sum_5_minutes > 1000 ) fails
on dd.u = fails.u
and dd.t >= fails.rng_5_minutes
and dd.t <= fails.t
order by dd.u, dd.t;
答案 2 :(得分:1)
以下是我的尝试:
select
s1."user", s1."time", sum (s2."value") as five_minute_value
from
sample s1
left join sample s2 on
s1."user" = s2."user" and
s1."time" between s2."time" and s2."time" + 4
group by
s1."user", s1."time"
having
sum (s2."value") > 1000
您的数据输出:
a 8 1017
a 9 1027
a 6 1012
a 5 1004