示例数据集,其中每一行记录一个独特人物/主要组合的有效时间段
注意:属于以下每个person_ID的所有记录都显示在此示例数据集中
注2:终止日期为9999/12/31,仅表示该人目前已注册该专业。
TBL_MAJORS
Person_ID Major Effective_Date Termination_Date
76 Biology 2016/8/1 2017/2/24
76 Accounting 2017/2/25 2017/4/15
76 Math 2017/4/16 9999/12/31
102 Physics 2016/8/1 2018/7/21
58 Literature 2016/8/1 2017/5/13
94 Art History 2016/8/1 2017/8/30
94 Comp Sci 2018/3/1 9999/12/31
最终目标数据集,其中包含一个新列,指示每个人是否在2017年全年持续注册专业
TBL_MAJORS_2
Person_ID Major Effective_Date Termination_Date Active_Throughout_2017
76 Biology 2016/8/1 2017/2/24 1
76 Accounting 2017/2/25 2017/4/15 1
76 Math 2017/4/16 9999/12/31 1
102 Physics 2016/8/1 2018/7/21 1
58 Literature 2016/8/1 2017/5/13 0
94 Art History 2016/8/1 2017/8/30 0
94 Comp Sci 2018/3/1 9999/12/31 0
更多解释
因此,例如,76人有三个不同的专业,从2016/8/1持续到9999/12/31(即他们目前正在注册),所以他们在2017年持续注册。
另一方面,94人有两个不同的专业,但2017/8/31至2018/2/28期间的活动存在差距,因此他们在2017年并未持续注册。
任何人都可以帮助我实现这一目标吗?如果你可以让它表现良好,那就值得称赞......我在想,也许这会涉及某种窗口/分析功能,但我老老实实地感到难过。
答案 0 :(得分:2)
这可以这样做:
WITH tbl_majors AS (SELECT 76 person_id, 'Biology' major, to_date('01/08/2016', 'dd/mm/yyyy') effective_date, to_date('24/02/2017', 'dd/mm/yyyy') termination_date FROM dual UNION ALL
SELECT 76 person_id, 'Accounting' major, to_date('25/02/2017', 'dd/mm/yyyy') effective_date, to_date('15/04/2017', 'dd/mm/yyyy') termination_date FROM dual UNION ALL
SELECT 76 person_id, 'Math' major, to_date('16/04/2017', 'dd/mm/yyyy') effective_date, to_date('31/12/9999', 'dd/mm/yyyy') termination_date FROM dual UNION ALL
SELECT 102 person_id, 'Physics' major, to_date('01/08/2016', 'dd/mm/yyyy') effective_date, to_date('21/07/2018', 'dd/mm/yyyy') termination_date FROM dual UNION ALL
SELECT 58 person_id, 'Literature' major, to_date('01/08/2016', 'dd/mm/yyyy') effective_date, to_date('13/05/2017', 'dd/mm/yyyy') termination_date FROM dual UNION ALL
SELECT 94 person_id, 'Art History' major, to_date('01/08/2016', 'dd/mm/yyyy') effective_date, to_date('30/08/2017', 'dd/mm/yyyy') termination_date FROM dual UNION ALL
SELECT 94 person_id, 'Comp Sci' major, to_date('01/03/2018', 'dd/mm/yyyy') effective_date, to_date('31/12/9999', 'dd/mm/yyyy') termination_date FROM dual)
SELECT person_id,
major,
effective_date,
termination_date,
CASE WHEN to_date('01/01/2017', 'dd/mm/yyyy') >= MIN(effective_date) OVER (PARTITION BY person_id, grp)
AND to_date('31/12/2017', 'dd/mm/yyyy') <= MAX(termination_date) OVER (PARTITION BY person_id, grp)
THEN 1
ELSE 0
END active_throughout_2017
FROM (SELECT person_id,
major,
effective_date,
termination_date,
SUM(new_grp_start) OVER (PARTITION BY person_id ORDER BY effective_date) grp
FROM (SELECT person_id,
major,
effective_date,
termination_date,
CASE WHEN effective_date = LAG(termination_date) OVER (PARTITION BY person_id ORDER BY effective_date) + 1 THEN 0 ELSE 1 END new_grp_start
FROM tbl_majors));
PERSON_ID MAJOR EFFECTIVE_DATE TERMINATION_DATE ACTIVE_THROUGHOUT_2017
---------- ----------- -------------- ---------------- ----------------------
58 Literature 01/08/2016 13/05/2017 0
76 Accounting 25/02/2017 15/04/2017 1
76 Biology 01/08/2016 24/02/2017 1
76 Math 16/04/2017 31/12/9999 1
94 Art History 01/08/2016 30/08/2017 0
94 Comp Sci 01/03/2018 31/12/9999 0
102 Physics 01/08/2016 21/07/2018 1
首先计算当前行和前一行是否连续 - 即当前行的effective_date是否大于前一行的termination_date的值。
一旦我们知道前一行是否连续,我们知道这是一个新组的开始。然后我们可以对此进行累积求和,这意味着所有连续的行将获得相同的数字,并且每次启动一组新行时数字都会增加。
然后,为每个人和组计算最小的effective_date和max termination_date,然后与2017年的第一天和最后一天进行重叠比较,这是一件简单的事情。
ETA:这是基于GSazheniuk建议的解决方案:
WITH tbl_majors AS (SELECT 76 person_id, 'Biology' major, to_date('01/08/2016', 'dd/mm/yyyy') effective_date, to_date('24/02/2017', 'dd/mm/yyyy') termination_date FROM dual UNION ALL
SELECT 76 person_id, 'Accounting' major, to_date('25/02/2017', 'dd/mm/yyyy') effective_date, to_date('15/04/2017', 'dd/mm/yyyy') termination_date FROM dual UNION ALL
SELECT 76 person_id, 'Math' major, to_date('16/04/2017', 'dd/mm/yyyy') effective_date, to_date('31/12/9999', 'dd/mm/yyyy') termination_date FROM dual UNION ALL
SELECT 102 person_id, 'Physics' major, to_date('01/08/2016', 'dd/mm/yyyy') effective_date, to_date('21/07/2018', 'dd/mm/yyyy') termination_date FROM dual UNION ALL
SELECT 58 person_id, 'Literature' major, to_date('01/08/2016', 'dd/mm/yyyy') effective_date, to_date('13/05/2017', 'dd/mm/yyyy') termination_date FROM dual UNION ALL
SELECT 94 person_id, 'Art History' major, to_date('01/08/2016', 'dd/mm/yyyy') effective_date, to_date('30/08/2017', 'dd/mm/yyyy') termination_date FROM dual UNION ALL
SELECT 94 person_id, 'Comp Sci' major, to_date('01/03/2018', 'dd/mm/yyyy') effective_date, to_date('31/12/9999', 'dd/mm/yyyy') termination_date FROM dual)
SELECT person_id,
major,
effective_date,
termination_date,
CASE WHEN SUM(LEAST(to_date('31/12/2017', 'dd/mm/yyyy'), termination_date) + 1 - GREATEST(to_date('01/01/2017', 'dd/mm/yyyy'), effective_date)) OVER (PARTITION BY person_id)
= to_date('31/12/2017', 'dd/mm/yyyy') + 1 - to_date('01/01/2017', 'dd/mm/yyyy')
THEN 1
ELSE 0
END active_throughout_2017
FROM tbl_majors;
正如您所看到的,它比我原来的解决方案更优雅。您应该测试两者以查看哪一个更适合您的数据,等等。
两种解决方案都假定不存在重叠(即学生在任何一个时刻都只有一个专业);你必须相应地调整它们,以满足某人一次可能有多个专业的情况。