我很难找到以下任务的查询
我有以下数据,想要查找每个唯一ID的总网络日
ID From To NetworkDay
1 03-Sep-12 07-Sep-12 5
1 03-Sep-12 04-Sep-12 2
1 05-Sep-12 06-Sep-12 2
1 06-Sep-12 12-Sep-12 5
1 31-Aug-12 04-Sep-12 3
2 04-Sep-12 06-Sep-12 3
2 11-Sep-12 13-Sep-12 3
2 05-Sep-12 08-Sep-12 3
问题是日期范围可能重叠,我无法想出能给我以下结果的SQL
ID From To NetworkDay
1 31-Aug-12 12-Sep-12 9
2 04-Sep-12 08-Sep-12 4
2 11-Sep-12 13-Sep-12 3
然后
ID Total Network Day
1 9
2 7
如果无法进行网络日计算,只需到第二个表即可。
希望我的问题很清楚
答案 0 :(得分:2)
我们可以在Oracle中使用Oracle Analytics(即“OVER ... PARTITION BY”子句)来执行此操作。 PARTITION BY子句有点像GROUP BY但没有聚合部分。这意味着我们可以将行组合在一起(即对它们进行分区),并将它们作为单独的组对它们执行操作。当我们对每一行进行操作时,我们可以访问上一行的列。这是PARTITION BY给我们的功能。 (PARTITION BY与表的分区无关。)
那么我们如何输出不重叠的日期呢?我们首先根据(ID,DFROM)字段对查询进行排序,然后我们使用ID字段来创建分区(行组)。然后,我们使用如下表达式测试前一行的TO值和重叠的当前行FROM值:(在伪代码中)
max(previous.DTO, current.DFROM) as DFROM
如果没有重叠,此基本表达式将返回原始DFROM值,但如果存在重叠,则返回先前的TO值。由于我们的行是有序的,我们只需要关注最后一行。如果前一行与当前行完全重叠,我们希望该行具有“零”日期范围。因此,我们为DTO字段做同样的事情:
max(previous.DTO, current.DFROM) as DFROM, max(previous.DTO, current.DTO) as DTO
一旦我们使用调整后的DFROM和DTO值生成了新的结果集,我们就可以将它们聚合起来并计算DFROM和DTO的范围间隔。
请注意,数据库中的大多数日期计算都不具有包容性,例如您的数据。所以像DATEDIFF(dto,dfrom)这样的东西不会包含实际引用的日期,所以我们要先调整dto一天。
我再也无法访问Oracle服务器,但我知道Oracle Analytics可以实现这一点。查询应该是这样的: (如果你让它发挥作用,请更新我的帖子。)
SELECT id,
max(dfrom, LAST_VALUE(dto) OVER (PARTITION BY id ORDER BY dfrom) ) as dfrom,
max(dto, LAST_VALUE(dto) OVER (PARTITION BY id ORDER BY dfrom) ) as dto
from (
select id, dfrom, dto+1 as dto from my_sample -- adjust the table so that dto becomes non-inclusive
order by id, dfrom
) sample;
这里的秘密是 LAST_VALUE(dto)OVER(PARTITION BY id ORDER BY dfrom)表达式,它返回当前行之前的值。 因此,此查询应输出不重叠的新dfrom / dto值。这是一个简单的问题,即查询这个(dto-dfrom)并总计总和。
我确实可以访问一个mysql服务器,所以我确实让它在那里工作。 MySQL没有像Oracle这样的结果分区(Analytics),所以我们必须使用结果集变量。这意味着我们使用@var:= xxx类型表达式记住最后一个日期值并调整dfrom / dto。相同的算法只是更长一点,更复杂的语法。我们还必须在ID字段更改时忘记最后一个日期值!
所以这是样本表(你有相同的值):
create table sample(id int, dfrom date, dto date, networkDay int);
insert into sample values
(1,'2012-09-03','2012-09-07',5),
(1,'2012-09-03','2012-09-04',2),
(1,'2012-09-05','2012-09-06',2),
(1,'2012-09-06','2012-09-12',5),
(1,'2012-08-31','2012-09-04',3),
(2,'2012-09-04','2012-09-06',3),
(2,'2012-09-11','2012-09-13',3),
(2,'2012-09-05','2012-09-08',3);
在查询中,我们输出如上所示的未分组结果集: 变量@ld是“last date”,变量@lid是“last id”。 @lid随时更改,我们将@ld重置为null。 FYI在mysql中:=运算符是赋值发生的地方,an =运算符只是等于。
这是一个3级查询,但它可以减少到2.我使用额外的外部查询来保持更具可读性。最内部的查询很简单,它将 dto 列调整为非包含性并执行正确的行排序。中间查询执行dfrom / dto值的调整以使它们不重叠。外部查询简单地删除未使用的字段,并计算间隔范围。
set @ldt=null, @lid=null;
select id, no_dfrom as dfrom, no_dto as dto, datediff(no_dto, no_dfrom) as days from (
select if(@lid=id,@ldt,@ldt:=null) as last, dfrom, dto, if(@ldt>=dfrom,@ldt,dfrom) as no_dfrom, if(@ldt>=dto,@ldt,dto) as no_dto, @ldt:=if(@ldt>=dto,@ldt,dto), @lid:=id as id,
datediff(dto, dfrom) as overlapped_days
from (select id, dfrom, dto + INTERVAL 1 DAY as dto from sample order by id, dfrom) as sample
) as nonoverlapped
order by id, dfrom;
以上查询给出了结果(通知dfrom / dto在这里不重叠):
+------+------------+------------+------+
| id | dfrom | dto | days |
+------+------------+------------+------+
| 1 | 2012-08-31 | 2012-09-05 | 5 |
| 1 | 2012-09-05 | 2012-09-08 | 3 |
| 1 | 2012-09-08 | 2012-09-08 | 0 |
| 1 | 2012-09-08 | 2012-09-08 | 0 |
| 1 | 2012-09-08 | 2012-09-13 | 5 |
| 2 | 2012-09-04 | 2012-09-07 | 3 |
| 2 | 2012-09-07 | 2012-09-09 | 2 |
| 2 | 2012-09-11 | 2012-09-14 | 3 |
+------+------------+------------+------+
答案 1 :(得分:0)
如何通过删除空洞并仅考虑最大间隔来构造合并间隔的SQL。它是这样的(未经测试):
SELECT DISTINCT F.ID, F.From, L.To
FROM Temp AS F, Temp AS L
WHERE F.From < L.To AND F.ID = L.ID
AND NOT EXISTS (SELECT *
FROM Temp AS T
WHERE T.ID = F.ID
AND F.From < T.From AND T.From < L.To
AND NOT EXISTS ( SELECT *
FROM Temp AS T1
WHERE T1.ID = F.ID
AND T1.From < T.From
AND T.From <= T1.To)
)
AND NOT EXISTS (SELECT *
FROM Temp AS T2
WHERE T2.ID = F.ID
AND (
(T2.From < F.From AND F.From <= T2.To)
OR (T2.From < L.To AND L.To < T2.To)
)
)
答案 2 :(得分:0)
with t_data as (
select 1 as id,
to_date('03-sep-12','dd-mon-yy') as start_date,
to_date('07-sep-12','dd-mon-yy') as end_date from dual
union all
select 1,
to_date('03-sep-12','dd-mon-yy'),
to_date('04-sep-12','dd-mon-yy') from dual
union all
select 1,
to_date('05-sep-12','dd-mon-yy'),
to_date('06-sep-12','dd-mon-yy') from dual
union all
select 1,
to_date('06-sep-12','dd-mon-yy'),
to_date('12-sep-12','dd-mon-yy') from dual
union all
select 1,
to_date('31-aug-12','dd-mon-yy'),
to_date('04-sep-12','dd-mon-yy') from dual
union all
select 2,
to_date('04-sep-12','dd-mon-yy'),
to_date('06-sep-12','dd-mon-yy') from dual
union all
select 2,
to_date('11-sep-12','dd-mon-yy'),
to_date('13-sep-12','dd-mon-yy') from dual
union all
select 2,
to_date('05-sep-12','dd-mon-yy'),
to_date('08-sep-12','dd-mon-yy') from dual
),
t_holidays as (
select to_date('01-jan-12','dd-mon-yy') as holiday
from dual
),
t_data_rn as (
select rownum as rn, t_data.* from t_data
),
t_model as (
select distinct id,
start_date
from t_data_rn
model
partition by (rn, id)
dimension by (0 as i)
measures(start_date, end_date)
rules
( start_date[for i
from 1
to end_date[0]-start_date[0]
increment 1] = start_date[0] + cv(i),
end_date[any] = start_date[cv()] + 1
)
order by 1,2
),
t_network_days as (
select t_model.*,
case when
mod(to_char(start_date, 'j'), 7) + 1 in (6, 7)
or t_holidays.holiday is not null
then 0 else 1
end as working_day
from t_model
left outer join t_holidays
on t_holidays.holiday = t_model.start_date
)
select id,
sum(working_day) as network_days
from t_network_days
group by id;
t_data
- 您的初始数据t_holidays
- 包含假期列表t_data_rn
- 只需向rownum
t_data
)
t_model
- 将t_data
日期范围扩展为平面日期列表t_network_days
- 根据星期几(星期六和星期日)和假期列表将t_model
的每个日期标记为工作日或周末