自加入时正确选择生效日期

时间:2019-09-06 19:54:20

标签: postgresql date-range self-join

我正在尝试构建SCD Type-2雇员-经理关系表。我已经设置了基本表:

| emp_id | manager_id | is_emp_self_managed | date_effective | date_expired |
|--------|------------|---------------------|----------------|--------------|
| 2      |            | TRUE                | 2004-04-01     | 2013-02-01   |
| 2      | 10         | FALSE               | 2013-02-01     | 2019-04-01   |
| 5      | 2          | FALSE               | 2005-12-01     | 2013-04-11   |
| 10     |            | TRUE                | 2013-02-01     | 2019-04-01   |

根据这些数据,我想为is_manager_self_managed添加一个附加的自引用列。当我进行自我联接时,我得到了这一点(出于说明目的,使用daterange作为日期列):

| emp_id | is_emp_self_managed | manager_id | is_manager_self_managed | emp_range                 | man_range               |
|--------|---------------------|------------|-------------------------|---------------------------|-------------------------|
| 2      | TRUE                |            | TRUE                    | [2004-04-01,2013-02-01)   | [2004-04-01,2013-02-01) |
| 2      | FALSE               | 10         | TRUE                    | [2013-02-01,2019-04-01)   | [2013-02-01,2019-04-01) |
| 5      | FALSE               | 2          | TRUE                    | *[2005-12-01,2013-04-11)* | [2004-04-01,2013-02-01) |
| 5      | FALSE               | 2          | FALSE                   | *[2005-12-01,2013-04-11)* | [2013-02-01,2019-04-01) |
| 10     | TRUE                |            | TRUE                    | [2013-02-01,2019-04-01)   | [2013-02-01,2019-04-01) |

跨日期范围的自我联接会导致emp_id = 5由于manager_id = 2从自我管理切换为非自我管理而获得了额外的一行。但是,我现在必须解决返回的日期范围冲突。最终,emp_id = 5将以其自己的有效日期范围开始和结束,但是引入的更改将需要合并到新的更新日期范围中。

查询以产生合并的输出:

with emp_data as (
select * 
from (
values(2,'2004-04-01'::date,'2013-02-01'::date,true,null)
,(2,'2013-02-01'::date,'2019-04-01'::date,false,10)
,(5,'2005-12-01'::date,'2013-04-11'::date,false,2)
,(10,'2013-02-01'::date,'2019-04-01'::date,true,null)
)t(emp_id, date_effective, date_expired, is_emp_self_managed, manager_id)
)

select t1.emp_id
    ,t1.is_emp_self_managed
    ,t1.manager_id
    ,t2.is_emp_self_managed as is_manager_self_managed
    ,daterange(t1.date_effective, t1.date_expired) as emp_range
    ,daterange(t2.date_effective, t2.date_expired) as man_range
from emp_data t1
left join emp_data t2 on coalesce(t1.manager_id, t1.emp_id) = t2.emp_id
    and ((t1.date_effective >= t2.date_effective and t1.date_effective < t2.date_expired)
            or (t2.date_effective >= t1.date_effective and t2.date_effective < t1.date_expired))
order by t1.emp_id, t1.date_effective, t2.date_effective

理想的输出如下所示:

| emp_id | is_emp_self_managed | manager_id | is_manager_self_managed | date_effective | date_expired |
|--------|---------------------|------------|-------------------------|----------------|--------------|
| 2      | TRUE                |            | TRUE                    | 2004-04-01     | 2013-02-01   |
| 2      | FALSE               | 10         | TRUE                    | 2013-02-01     | 2019-04-01   |
| 5      | FALSE               | 2          | TRUE                    | *2005-12-01*   | *2013-02-01* |
| 5      | FALSE               | 2          | FALSE                   | *2013-02-01*   | *2013-04-11* |
| 10     | TRUE                |            | TRUE                    | 2013-02-01     | 2019-04-01   |

1 个答案:

答案 0 :(得分:0)

我刚刚意识到这可能有效:

with emp_data as (
select * 
from (
values(2,'2004-04-01'::date,'2013-02-01'::date,true,null)
,(2,'2013-02-01'::date,'2019-04-01'::date,false,10)
,(5,'2005-12-01'::date,'2013-04-11'::date,false,2)
,(10,'2013-02-01'::date,'2019-04-01'::date,true,null)
)t(emp_id, date_effective, date_expired, is_emp_self_managed, manager_id)
)
select t1.emp_id
    ,t1.is_emp_self_managed
    ,t1.manager_id
    ,t2.is_emp_self_managed as is_manager_self_managed
--Added case statements 
    ,case
        when t1.date_effective <@ daterange(t2.date_effective, t2.date_expired)
        and not t2.date_effective <@ daterange(t1.date_effective, t1.date_expired)
        then t1.date_effective
        else t2.date_effective
    end as date_effective
    ,case
        when t1.date_expired <@ daterange(t2.date_effective, t2.date_expired)
        and not t2.date_expired <@ daterange(t1.date_effective, t1.date_expired)
        then t1.date_expired
        else t2.date_expired
    end as date_expired
    ,daterange(t1.date_effective, t1.date_expired) as emp_range
    ,daterange(t2.date_effective, t2.date_expired) as man_range
from emp_data t1
left join emp_data t2 on coalesce(t1.manager_id, t1.emp_id) = t2.emp_id
    and ((t1.date_effective >= t2.date_effective and t1.date_effective < t2.date_expired)
            or (t2.date_effective >= t1.date_effective and t2.date_effective < t1.date_expired))
order by t1.emp_id, t1.date_effective, t2.date_effective