唯一日期的数量

时间:2015-11-06 12:58:41

标签: sql oracle query-optimization

有表格:

CREATE TABLE my_table 
  (gr_id      NUMBER,
   start_date DATE,
   end_date   DATE);

所有日期始终为零时间部分。我需要知道一种计算gr_id内唯一日期数的最快方法。

例如,如果有行(dd.mm.rrrr):

1 | 01.01.2000 | 07.01.2000
1 | 01.01.2000 | 07.01.2000
2 | 01.01.2000 | 03.01.2000
2 | 05.01.2000 | 07.01.2000
3 | 01.01.2000 | 04.01.2000
3 | 03.01.2000 | 05.01.2000

然后正确答案将是

1 | 7 
2 | 6 
3 | 5

现在我使用附加表

CREATE TABLE mfr_date_list
   (MFR_DATE DATE);

每个日期在01.01.2000和31.12.2020之间,并且查询如下:

SELECT COUNT(DISTINCT mfr_date_list.mfr_date) cnt, 
       dt.gr_id
  FROM dwh_mfr.mfr_date_list,
        (SELECT gr_id, 
                start_date AS sd, 
                end_date AS ed
               FROM my_table
        ) dt
 WHERE mfr_date_list.mfr_date BETWEEN dt.sd AND dt.ed
   AND dt.ed IS NOT NULL
 GROUP BY dt.gr_id

此查询返回正确的结果数据集,但我认为这不是最快的方法。我认为有一些方法可以根据表mfr_date_list构建查询。

Oracle 11.2 64位。

2 个答案:

答案 0 :(得分:1)

我希望你所做的是最快的方式(一如既往的测试)。您的查询可以简化,但这只会帮助理解,而不一定是速度:

select t.gr_id, count(distinct dl.mfr_date) as cnt
  from my_table t
  join mfr_date_list dl
    on dl.mfr_date between t.date_start and t.date_end
 where t.end_date is not null
 group by t.gr_id

无论你做什么,你都必须以某种方式在两个日期之间生成数据,因为你需要删除重叠。一种方法是使用CAST(MULTISET())Lalit Kumar explains

select gr_id, count(distinct end_date - column_value + 1)
  from my_table m
 cross join table(cast(multiset(select level
                                  from dual
                               connect by level <= m.end_date - m.start_date + 1
                                       ) as sys.odcinumberlist))
 group by gr_id;

     GR_ID COUNT(DISTINCTEND_DATE-COLUMN_VALUE+1)
---------- --------------------------------------
         1                                      7
         2                                      6
         3                                      5

这是非常特定于Oracle的,但是应该比大多数其他行生成器执行得更好,因为您只访问了一次表,并且由于链接MY_TABLE的条件而导致生成所需的最小行数你生成的行。

答案 1 :(得分:0)

您真正需要做的是组合范围然后计算长度。由于重复日期,这可能非常具有挑战性。以下是解决此问题的一种方法。

首先,枚举日期并确定日期是“in”还是“out”。当累积和为0时,它就是“out”:

select t.gr_id, dt,
       sum(inc) over (partition by t.gr_id order by dt) as cume_inc
from (select t.gr_id, t.start_date as dt, 1 as inc
      from my_table t
      union all
      select t.gr_id, t.end_date + 1, -1 as inc
      from my_table t
     ) t

然后,使用lead()确定期间的长度:

with inc as (
      select t.gr_id, dt,
             sum(inc) over (partition by t.gr_id order by dt) as cume_inc
      from (select t.gr_id, t.start_date as dt, 1 as inc
            from my_table t
            union all
            select t.gr_id, t.end_date + 1, -1 as inc
            from my_table t
           ) t
     )
select t.gr_id,
       sum(nextdt - dt) as daysInUse
from (select inc.*, lead(dt) over (partition by t.gr_id order by dt) as nextdt
      from inc
     ) t
group by t.gr_id;

这接近你想要的。以下是两个挑战:(1)限制和(2)处理关系。以下应该有效(尽管可能存在一个一个和一个边界问题):

with inc as (
      select t.gr_id, dt, priority,
             sum(inc) over (partition by t.gr_id order by dt) as cume_inc
      from ((select t.gr_id, t.start_date as dt, count(*) as inc, 1 as priority
             from my_table t
             group by t.gr_id, t.start_date
            )
            union all
            (select t.gr_id, t.end_date + 1, - count(*) as inc, -1
             from my_table t
             group by t.gr_id, t.end_date
            )
           ) t
     )
select t.gr_id,
       sum(least(nextdt, date '2020-12-31') - greatest(dt, date, '2010-01-01')) as daysInUse
from (select inc.*, lead(dt) over (partition by t.gr_id order by dt, priority) as nextdt
      from inc
     ) t
group by t.gr_id;