我有一份工作,我必须在活跃期间选择记录
我的初始数据样本是:
department_id, employee_id, start_dt, end_dt
1 11 2016-01-01 2016-01-03
1 11 2016-01-04 2016-01-07
1 11 2016-01-08 2016-01-11
1 12 2016-01-12 2016-01-14
1 12 2016-01-15 2016-01-17
1 12 2016-01-18 2016-01-20
1 11 2016-01-21 2016-01-24
1 11 2016-01-25 2016-01-25
1 14 2016-01-26 2016-01-27
2 11 2016-04-01 2016-04-10
我的预期输出:
department_id, employee_id, start_dt, end_dt
1 11 2016-01-01 2016-01-11
1 12 2016-01-12 2016-01-20
1 11 2016-01-21 2016-01-25
1 14 2016-01-26 2016-01-27
2 11 2016-04-01 2016-04-10
我尝试使用max/min and partition by
,但employee_id
可能会在不同时间在department_id
内重复
答案 0 :(得分:1)
以下是使用LAG解析函数的一种方法,该问题通常称为群组和岛屿问题
WITH cte
AS (SELECT department_id,
employee_id,
start_dt,
end_dt,
Sum(CASE WHEN pstart = employee_id THEN 0 ELSE 1 END)
OVER(partition BY department_id ORDER BY start_dt) AS Counter
FROM (SELECT department_id,
employee_id,
start_dt,
end_dt,
Lag(employee_id, 1, NULL) OVER( partition BY department_id
ORDER BY start_dt) AS pstart
FROM #Table1 ) t)
SELECT department_id,
employee_id,
min(start_dt) as start_dt,
max(end_dt) as end_dt
FROM (SELECT row_number() OVER(partition BY department_id, counter
ORDER BY start_dt) AS rn,
department_id, employee_id, start_dt, end_dt ,counter
FROM cte) a
group by department_id,
employee_id,
counter
答案 1 :(得分:0)
您需要将活动时段链接在一起。一种方法是确定一个时期何时开始 - 并为此创建一个标志。累积总和然后标识一组活动。其余的只是聚合:
select department_id, employee_id, min(start_dt) as start_dt,
max(end_dt) as end_t
from (select t.*,
sum(IsGroupStart) over (partition by department_id, employee_id order by start_dt) as grp
from (select t.*,
(case when exists (select 1
from t t2
where t2.department_id = t.department_id and
t2.employee_id = t.employee_id and
t.start_dt between t2.start_dt and t2.end_dt + 1
)
then 0 else 1
end) as IsGroupStart
employee_id
from t
) t
) t
group by department_id, employee_id, grp;