我的任务是在1小时内确定不同帐户上的三个事件的事实。
解决方案可能类似于
count(distinct account_id) over (order by time_key range between 20 PRECEDING and CURRENT ROW)
并检查count()> 3
但是Oracle不能在order by子句中使用不同的功能:
ORA-30487:此处不允许ORDER BY
我有下面的解决方法,但是似乎很难
with t_data as (
select 1 as account_id, 1000 as time_key from dual union
select 1 as account_id, 1010 as time_key from dual union
select 1 as account_id, 1020 as time_key from dual union
select 1 as account_id, 1030 as time_key from dual union
select 2 as account_id, 1040 as time_key from dual union
select 3 as account_id, 1050 as time_key from dual union
select 3 as account_id, 1060 as time_key from dual union
select 3 as account_id, 1070 as time_key from dual union
select 3 as account_id, 1080 as time_key from dual union
select 3 as account_id, 1090 as time_key from dual
order by time_key
)
select *
from (
select account_id,
time_key,
max(
case
when account_id = 1 then 1
else 0
end
) over (order by time_key range between 20 PRECEDING and CURRENT ROW) as m1,
max(
case
when account_id = 2 then 1
else 0
end
) over (order by time_key range between 20 PRECEDING and CURRENT ROW) as m2,
max(
case
when account_id = 3 then 1
else 0
end
) over (order by time_key range between 20 PRECEDING and CURRENT ROW) as m3
from t_data
)
where m1 = 1 and m2 = 1 and m3 = 1
确定滑动窗口中不同事件数量的更简单方法是什么?
答案 0 :(得分:1)
对我来说,如何使用窗口函数执行此操作不是立即显而易见的。您可以使用相关的子查询:
select t.*,
(select count(distinct t2.account_id)
from t_data t2
where t2.time_key >= t.time_key - 20 and t2.time_key <= t.time_key
)
from t_data t;
另一种方法-可能具有更好的性能-是将问题视为“隔岛问题”。以下版本返回每个时间键上同时存在的不同帐户的数量:
with t as (
select account_id, min(time_key) as min_time_key, max(time_key + 20) as max_time_key
from (select t.*, sum(case when time_key - prev_time_key <= 20 then 0 else 1 end) over (order by time_key) as grp
from (select t.*, lag(time_key) over (partition by account_id order by time_key) as prev_time_key
from t_data t
) t
) t
group by account_id
)
select td.account_id, td.time_key, count(distinct t.account_id) as num_distinct
from t_data td join
t
on td.time_key between t.min_time_key and t.max_time_key
group by td.account_id, td.time_key;
最后,如果您只有3个(或2个)帐户ID要查找和,则您只关心获得最高点的示例,那么您可以执行以下操作:< / p>
select t.*
from (select t.*,
min(account_id) over (order by time_key range between 20 preceding and 1 preceding) as min_account_id,
max(account_id) over (order by time_key range between 20 preceding and 1 preceding) as max_account_id
from t_data t
) t
where min_account_id <> max_account_id and
account_id <> min_account_id and
account_id <> max_account_id;
这将从前20行中获取最大和最小帐户ID,但不包括当前行。如果这些值与当前值不同,那么您将拥有三个不同的值。
答案 1 :(得分:1)
这是一种超简单的方法。我们可以提高性能,也许您想发布一些有关表大小的细节。
select t1.account_id, t1.time_key, count(distinct t2.account_id) cnt
from t_data t1 cross join t_data t2
where t2.time_key between t1.time_key - 20 and t1.time_key
group by t1.account_id, t1.time_key
having count(distinct t2.account_id) >= 3;
答案 2 :(得分:1)
如果您真的只想使用单个窗口子句,可以采用以下方法:
with product_of_primes as (
select t.*, round(exp(sum(ln(decode(account_id,1,2,2,3,3,5)))
over ( order by time_key range between 20 preceding
and current row ))) product from t_data t
)
select account_id, time_key from product_of_primes
where mod(product,2*3*5) = 0;
说明:
如果你是我的团队成员,并且写了这篇文章,我会杀了你。
带有数据的完整示例:
with t_data as (
select 1 as account_id, 1000 as time_key from dual union
select 1 as account_id, 1010 as time_key from dual union
select 1 as account_id, 1020 as time_key from dual union
select 1 as account_id, 1030 as time_key from dual union
select 2 as account_id, 1040 as time_key from dual union
select 3 as account_id, 1050 as time_key from dual union
select 3 as account_id, 1060 as time_key from dual union
select 3 as account_id, 1070 as time_key from dual union
select 3 as account_id, 1080 as time_key from dual union
select 3 as account_id, 1090 as time_key from dual
order by time_key
),
product_of_primes as (
select t.*, round(exp(sum(ln(decode(account_id,1,2,2,3,3,5)))
over ( order by time_key range between 20 preceding
and current row ))) product from t_data t
)
select account_id, time_key from product_of_primes
where mod(product,2*3*5) = 0;
结果:
+------------+----------+---------+
| ACCOUNT_ID | TIME_KEY | PRODUCT |
+------------+----------+---------+
| 3 | 1050 | 30 |
+------------+----------+---------+