如何使用Oracle

时间:2018-08-16 23:44:03

标签: sql oracle window

我的任务是在1小时内确定不同帐户上的三个事件的事实。

解决方案可能类似于

count(distinct account_id) over (order by time_key range between 20 PRECEDING and CURRENT ROW)

并检查count()> 3

但是Oracle不能在order by子句中使用不同的功能:

ORA-30487:此处不允许ORDER BY

我有下面的解决方法,但是似乎很难

with t_data as (
select 1 as account_id, 1000 as time_key from dual union
select 1 as account_id, 1010 as time_key from dual union
select 1 as account_id, 1020 as time_key from dual union
select 1 as account_id, 1030 as time_key from dual union
select 2 as account_id, 1040 as time_key from dual union
select 3 as account_id, 1050 as time_key from dual union
select 3 as account_id, 1060 as time_key from dual union
select 3 as account_id, 1070 as time_key from dual union
select 3 as account_id, 1080 as time_key from dual union
select 3 as account_id, 1090 as time_key from dual
order by time_key
)

select *
from (
  select  account_id,
          time_key,
          max(
              case 
               when account_id = 1 then 1
               else 0
              end
          ) over (order by time_key range between 20 PRECEDING and CURRENT ROW) as m1,
          max(
              case 
               when account_id = 2 then 1
               else 0
              end
          ) over (order by time_key range between 20 PRECEDING and CURRENT ROW) as m2,
          max(
              case 
               when account_id = 3 then 1
               else 0
              end
          ) over (order by time_key range between 20 PRECEDING and CURRENT ROW) as m3
  from t_data
)
where m1 = 1 and m2 = 1 and m3 = 1

确定滑动窗口中不同事件数量的更简单方法是什么?

3 个答案:

答案 0 :(得分:1)

对我来说,如何使用窗口函数执行此操作不是立即显而易见的。您可以使用相关的子查询:

select t.*,
       (select count(distinct t2.account_id)
        from t_data t2
        where t2.time_key >= t.time_key - 20 and t2.time_key <= t.time_key
       )
from t_data t;

另一种方法-可能具有更好的性能-是将问题视为“隔岛问题”。以下版本返回每个时间键上同时存在的不同帐户的数量:

with t as (
      select account_id, min(time_key) as min_time_key, max(time_key + 20) as max_time_key
      from (select t.*, sum(case when time_key - prev_time_key <= 20 then 0 else 1 end) over (order by time_key) as grp
            from (select t.*, lag(time_key) over (partition by account_id order by time_key) as prev_time_key
                  from t_data t
                 ) t
           ) t
      group by account_id
     )
select td.account_id, td.time_key, count(distinct t.account_id) as num_distinct
from t_data td join
     t
     on td.time_key between t.min_time_key and t.max_time_key
group by td.account_id, td.time_key;

最后,如果您只有3个(或2个)帐户ID要查找,则您只关心获得最高点的示例,那么您可以执行以下操作:< / p>

select t.*
from (select t.*,
             min(account_id) over (order by time_key range between 20 preceding and 1 preceding) as min_account_id,
             max(account_id) over (order by time_key range between 20 preceding and 1 preceding) as max_account_id
      from t_data t
     ) t
where min_account_id <> max_account_id and
      account_id <> min_account_id and
      account_id <> max_account_id;

这将从前20行中获取最大和最小帐户ID,但不包括当前行。如果这些值与当前值不同,那么您将拥有三个不同的值。

答案 1 :(得分:1)

这是一种超简单的方法。我们可以提高性能,也许您想发布一些有关表大小的细节。

select t1.account_id, t1.time_key, count(distinct t2.account_id) cnt
from t_data t1 cross join t_data t2
where t2.time_key between t1.time_key - 20 and t1.time_key
group by t1.account_id, t1.time_key
having count(distinct t2.account_id) >= 3;

答案 2 :(得分:1)

如果您真的只想使用单个窗口子句,可以采用以下方法:

with product_of_primes as (
select t.*, round(exp(sum(ln(decode(account_id,1,2,2,3,3,5))) 
       over ( order by time_key range between 20 preceding
                   and current row ))) product from t_data t
)
select account_id, time_key from product_of_primes
where mod(product,2*3*5) = 0;

说明:

  • 将每个不同的account_id转换为素数。因此,第一个account_id为2,下一个为3,下一个为5。
  • 获取该数字的自然对数
  • 总结过去一小时(即在我们的窗口中)所有事件的自然日志,请记住ln(a)+ ln(b)= ln(a * b)
  • 以e为总和的力量
  • (到目前为止,这是将我们映射到account_ids的所有素数相乘的漫长方法)
  • 该结果可被我们使用的所有三个质数均分的所有行(2,3,5-因此可被30整除)在其窗口中具有所有三个不同的account_id。

如果你是我的团队成员,并且写了这篇文章,我会杀了你。

带有数据的完整示例:

with t_data as (
select 1 as account_id, 1000 as time_key from dual union
select 1 as account_id, 1010 as time_key from dual union
select 1 as account_id, 1020 as time_key from dual union
select 1 as account_id, 1030 as time_key from dual union
select 2 as account_id, 1040 as time_key from dual union
select 3 as account_id, 1050 as time_key from dual union
select 3 as account_id, 1060 as time_key from dual union
select 3 as account_id, 1070 as time_key from dual union
select 3 as account_id, 1080 as time_key from dual union
select 3 as account_id, 1090 as time_key from dual
order by time_key
),
product_of_primes as (
select t.*, round(exp(sum(ln(decode(account_id,1,2,2,3,3,5))) 
        over ( order by time_key range between 20 preceding 
               and current row ))) product from t_data t
)
select account_id, time_key from product_of_primes
where mod(product,2*3*5) = 0;

结果:

+------------+----------+---------+
| ACCOUNT_ID | TIME_KEY | PRODUCT |
+------------+----------+---------+
|          3 |     1050 |      30 |
+------------+----------+---------+