如何确定实体是否在多个数据库记录中持续活动?

时间:2018-04-03 13:42:58

标签: sql oracle

示例数据集,其中每一行记录一个独特人物/主要组合的有效时间段

注意:属于以下每个person_ID的所有记录都显示在此示例数据集中

注2:终止日期为9999/12/31,仅表示该人目前已注册该专业。

TBL_MAJORS

Person_ID   Major         Effective_Date   Termination_Date
76          Biology       2016/8/1         2017/2/24   
76          Accounting    2017/2/25        2017/4/15  
76          Math          2017/4/16        9999/12/31         
102         Physics       2016/8/1         2018/7/21
58          Literature    2016/8/1         2017/5/13
94          Art History   2016/8/1         2017/8/30   
94          Comp Sci      2018/3/1         9999/12/31

最终目标数据集,其中包含一个新列,指示每个人是否在2017年全年持续注册专业

TBL_MAJORS_2

Person_ID   Major         Effective_Date   Termination_Date   Active_Throughout_2017
76          Biology       2016/8/1         2017/2/24          1
76          Accounting    2017/2/25        2017/4/15          1
76          Math          2017/4/16        9999/12/31         1
102         Physics       2016/8/1         2018/7/21          1
58          Literature    2016/8/1         2017/5/13          0
94          Art History   2016/8/1         2017/8/30          0
94          Comp Sci      2018/3/1         9999/12/31         0

更多解释

因此,例如,76人有三个不同的专业,从2016/8/1持续到9999/12/31(即他们目前正在注册),所以他们在2017年持续注册。

另一方面,94人有两个不同的专业,但2017/8/31至2018/2/28期间的活动存在差距,因此他们在2017年并未持续注册。

任何人都可以帮助我实现这一目标吗?如果你可以让它表现良好,那就值得称赞......我在想,也许这会涉及某种窗口/分析功能,但我老老实实地感到难过。

1 个答案:

答案 0 :(得分:2)

这可以这样做:

WITH tbl_majors AS (SELECT 76 person_id, 'Biology' major, to_date('01/08/2016', 'dd/mm/yyyy') effective_date, to_date('24/02/2017', 'dd/mm/yyyy') termination_date FROM dual UNION ALL
                    SELECT 76 person_id, 'Accounting' major, to_date('25/02/2017', 'dd/mm/yyyy') effective_date, to_date('15/04/2017', 'dd/mm/yyyy') termination_date FROM dual UNION ALL
                    SELECT 76 person_id, 'Math' major, to_date('16/04/2017', 'dd/mm/yyyy') effective_date, to_date('31/12/9999', 'dd/mm/yyyy') termination_date FROM dual UNION ALL
                    SELECT 102 person_id, 'Physics' major, to_date('01/08/2016', 'dd/mm/yyyy') effective_date, to_date('21/07/2018', 'dd/mm/yyyy') termination_date FROM dual UNION ALL
                    SELECT 58 person_id, 'Literature' major, to_date('01/08/2016', 'dd/mm/yyyy') effective_date, to_date('13/05/2017', 'dd/mm/yyyy') termination_date FROM dual UNION ALL
                    SELECT 94 person_id, 'Art History' major, to_date('01/08/2016', 'dd/mm/yyyy') effective_date, to_date('30/08/2017', 'dd/mm/yyyy') termination_date FROM dual UNION ALL
                    SELECT 94 person_id, 'Comp Sci' major, to_date('01/03/2018', 'dd/mm/yyyy') effective_date, to_date('31/12/9999', 'dd/mm/yyyy') termination_date FROM dual)
SELECT person_id,
       major,
       effective_date,
       termination_date,
       CASE WHEN to_date('01/01/2017', 'dd/mm/yyyy') >= MIN(effective_date) OVER (PARTITION BY person_id, grp)
                 AND to_date('31/12/2017', 'dd/mm/yyyy') <= MAX(termination_date) OVER (PARTITION BY person_id, grp)
                 THEN 1
            ELSE 0
       END active_throughout_2017
FROM   (SELECT person_id,
               major,
               effective_date,
               termination_date,
               SUM(new_grp_start) OVER (PARTITION BY person_id ORDER BY effective_date) grp
        FROM   (SELECT person_id,
                       major,
                       effective_date,
                       termination_date,
                       CASE WHEN effective_date = LAG(termination_date) OVER (PARTITION BY person_id ORDER BY effective_date) + 1 THEN 0 ELSE 1 END new_grp_start
                FROM   tbl_majors));

 PERSON_ID MAJOR       EFFECTIVE_DATE TERMINATION_DATE ACTIVE_THROUGHOUT_2017
---------- ----------- -------------- ---------------- ----------------------
        58 Literature  01/08/2016     13/05/2017                            0
        76 Accounting  25/02/2017     15/04/2017                            1
        76 Biology     01/08/2016     24/02/2017                            1
        76 Math        16/04/2017     31/12/9999                            1
        94 Art History 01/08/2016     30/08/2017                            0
        94 Comp Sci    01/03/2018     31/12/9999                            0
       102 Physics     01/08/2016     21/07/2018                            1

首先计算当前行和前一行是否连续 - 即当前行的effective_date是否大于前一行的termination_date的值。

一旦我们知道前一行是否连续,我们知道这是一个新组的开始。然后我们可以对此进行累积求和,这意味着所有连续的行将获得相同的数字,并且每次启动一组新行时数字都会增加。

然后,为每个人和组计算最小的effective_date和max termination_date,然后与2017年的第一天和最后一天进行重叠比较,这是一件简单的事情。

ETA:这是基于GSazheniuk建议的解决方案:

WITH tbl_majors AS (SELECT 76 person_id, 'Biology' major, to_date('01/08/2016', 'dd/mm/yyyy') effective_date, to_date('24/02/2017', 'dd/mm/yyyy') termination_date FROM dual UNION ALL
                    SELECT 76 person_id, 'Accounting' major, to_date('25/02/2017', 'dd/mm/yyyy') effective_date, to_date('15/04/2017', 'dd/mm/yyyy') termination_date FROM dual UNION ALL
                    SELECT 76 person_id, 'Math' major, to_date('16/04/2017', 'dd/mm/yyyy') effective_date, to_date('31/12/9999', 'dd/mm/yyyy') termination_date FROM dual UNION ALL
                    SELECT 102 person_id, 'Physics' major, to_date('01/08/2016', 'dd/mm/yyyy') effective_date, to_date('21/07/2018', 'dd/mm/yyyy') termination_date FROM dual UNION ALL
                    SELECT 58 person_id, 'Literature' major, to_date('01/08/2016', 'dd/mm/yyyy') effective_date, to_date('13/05/2017', 'dd/mm/yyyy') termination_date FROM dual UNION ALL
                    SELECT 94 person_id, 'Art History' major, to_date('01/08/2016', 'dd/mm/yyyy') effective_date, to_date('30/08/2017', 'dd/mm/yyyy') termination_date FROM dual UNION ALL
                    SELECT 94 person_id, 'Comp Sci' major, to_date('01/03/2018', 'dd/mm/yyyy') effective_date, to_date('31/12/9999', 'dd/mm/yyyy') termination_date FROM dual)
SELECT person_id,
       major,
       effective_date,
       termination_date,
       CASE WHEN SUM(LEAST(to_date('31/12/2017', 'dd/mm/yyyy'), termination_date) + 1 - GREATEST(to_date('01/01/2017', 'dd/mm/yyyy'), effective_date)) OVER (PARTITION BY person_id)
                   = to_date('31/12/2017', 'dd/mm/yyyy') + 1 - to_date('01/01/2017', 'dd/mm/yyyy')
                 THEN 1
            ELSE 0
       END active_throughout_2017
FROM   tbl_majors;

正如您所看到的,它比我原来的解决方案更优雅。您应该测试两者以查看哪一个更适合您的数据,等等。

两种解决方案都假定不存在重叠(即学生在任何一个时刻都只有一个专业);你必须相应地调整它们,以满足某人一次可能有多个专业的情况。