有表格:
CREATE TABLE my_table
(gr_id NUMBER,
start_date DATE,
end_date DATE);
所有日期始终为零时间部分。我需要知道一种计算gr_id内唯一日期数的最快方法。
例如,如果有行(dd.mm.rrrr):
1 | 01.01.2000 | 07.01.2000
1 | 01.01.2000 | 07.01.2000
2 | 01.01.2000 | 03.01.2000
2 | 05.01.2000 | 07.01.2000
3 | 01.01.2000 | 04.01.2000
3 | 03.01.2000 | 05.01.2000
然后正确答案将是
1 | 7
2 | 6
3 | 5
现在我使用附加表
CREATE TABLE mfr_date_list
(MFR_DATE DATE);
每个日期在01.01.2000和31.12.2020之间,并且查询如下:
SELECT COUNT(DISTINCT mfr_date_list.mfr_date) cnt,
dt.gr_id
FROM dwh_mfr.mfr_date_list,
(SELECT gr_id,
start_date AS sd,
end_date AS ed
FROM my_table
) dt
WHERE mfr_date_list.mfr_date BETWEEN dt.sd AND dt.ed
AND dt.ed IS NOT NULL
GROUP BY dt.gr_id
此查询返回正确的结果数据集,但我认为这不是最快的方法。我认为有一些方法可以根据表mfr_date_list构建查询。
Oracle 11.2 64位。
答案 0 :(得分:1)
我希望你所做的是最快的方式(一如既往的测试)。您的查询可以简化,但这只会帮助理解,而不一定是速度:
select t.gr_id, count(distinct dl.mfr_date) as cnt
from my_table t
join mfr_date_list dl
on dl.mfr_date between t.date_start and t.date_end
where t.end_date is not null
group by t.gr_id
无论你做什么,你都必须以某种方式在两个日期之间生成数据,因为你需要删除重叠。一种方法是使用CAST(MULTISET())
,Lalit Kumar explains:
select gr_id, count(distinct end_date - column_value + 1)
from my_table m
cross join table(cast(multiset(select level
from dual
connect by level <= m.end_date - m.start_date + 1
) as sys.odcinumberlist))
group by gr_id;
GR_ID COUNT(DISTINCTEND_DATE-COLUMN_VALUE+1)
---------- --------------------------------------
1 7
2 6
3 5
这是非常特定于Oracle的,但是应该比大多数其他行生成器执行得更好,因为您只访问了一次表,并且由于链接MY_TABLE的条件而导致生成所需的最小行数你生成的行。
答案 1 :(得分:0)
您真正需要做的是组合范围然后计算长度。由于重复日期,这可能非常具有挑战性。以下是解决此问题的一种方法。
首先,枚举日期并确定日期是“in”还是“out”。当累积和为0时,它就是“out”:
select t.gr_id, dt,
sum(inc) over (partition by t.gr_id order by dt) as cume_inc
from (select t.gr_id, t.start_date as dt, 1 as inc
from my_table t
union all
select t.gr_id, t.end_date + 1, -1 as inc
from my_table t
) t
然后,使用lead()
确定期间的长度:
with inc as (
select t.gr_id, dt,
sum(inc) over (partition by t.gr_id order by dt) as cume_inc
from (select t.gr_id, t.start_date as dt, 1 as inc
from my_table t
union all
select t.gr_id, t.end_date + 1, -1 as inc
from my_table t
) t
)
select t.gr_id,
sum(nextdt - dt) as daysInUse
from (select inc.*, lead(dt) over (partition by t.gr_id order by dt) as nextdt
from inc
) t
group by t.gr_id;
这接近你想要的。以下是两个挑战:(1)限制和(2)处理关系。以下应该有效(尽管可能存在一个一个和一个边界问题):
with inc as (
select t.gr_id, dt, priority,
sum(inc) over (partition by t.gr_id order by dt) as cume_inc
from ((select t.gr_id, t.start_date as dt, count(*) as inc, 1 as priority
from my_table t
group by t.gr_id, t.start_date
)
union all
(select t.gr_id, t.end_date + 1, - count(*) as inc, -1
from my_table t
group by t.gr_id, t.end_date
)
) t
)
select t.gr_id,
sum(least(nextdt, date '2020-12-31') - greatest(dt, date, '2010-01-01')) as daysInUse
from (select inc.*, lead(dt) over (partition by t.gr_id order by dt, priority) as nextdt
from inc
) t
group by t.gr_id;