我有两个要合并的SCD Type 2格式的二维表。第二个表包含与第一个表的最终结构相关的其他行。
第一个表(foo):
| employee_id | location_id | team_id | date_effective | date_expired |
|-------------|-------------|---------|----------------|--------------|
| 40 | 1 | 6 | 20180101 | 20190331 |
| 40 | 2 | 6 | 20190331 | 99991231 |
第二张表(栏):
| team_id | manager_id | date_effective | date_expired |
|---------|------------|----------------|--------------|
| 6 | 15 | 20180301 | 20180630 |
| 6 | 27 | 20180630 | 99991231 |
合并后所需的输出:
| employee_id | location_id | team_id | manager_id | date_effective | date_expired |
|-------------|-------------|---------|------------|----------------|--------------|
| 40 | 1 | 6 | NULL | 20180101 | 20180301 |
| 40 | 1 | 6 | 15 | 20180301 | 20180630 |
| 40 | 1 | 6 | 27 | 20180630 | 20190331 |
| 40 | 2 | 6 | 27 | 20190331 | 99991231 |
我知道如何按日期将两个表连接起来,但是不知道如何有效地生成输出中必要的额外行。这是我当前的代码:
with foo as (
select *
from
(values (40,1,6,20180101,20190331),(40,2,6,20190331,99991231))t(employee_id, location_id, team_id, date_effective, date_expired)
)
,bar as (
select *
from
(values (6,15,20180301,20180630),(6,27,20180630,99991231))t(team_id, manager_id, date_effective, date_expired)
)
select *
from foo f
left join bar b on f.team_id = b.team_id
and ((f.date_effective between b.date_effective and b.date_expired)
or (b.date_effective >= f.date_effective and b.date_effective < f.date_expired))
我知道我可以通过将每个表扩展到不同的日子并执行一些窗口函数来获得结果,但是我想知道是否有更有效的方法。
谢谢!
答案 0 :(得分:0)
一个可能的解决方案是在team和id的foo和bar上找到不同的日期,然后重新加入foo和bar。
with foo as (
select *
from
(values (40,1,6,20180101,20190331),(40,2,6,20190331,99991231))t(employee_id, location_id, team_id, date_effective, date_expired)
)
,bar as (
select *
from
(values (6,15,20180301,20180630),(6,27,20180630,99991231))t(team_id, manager_id, date_effective, date_expired)
)
,dist as (
select date_effective, team_id
from foo
union
select date_effective, team_id
from bar
)
select *
from dist d
left join foo f on d.team_id = f.team_id and d.date_effective >= f.date_effective and d.date_effective < f.date_expired
left join bar b on d.team_id = b.team_id and d.date_effective >= b.date_effective and d.date_effective < b.date_expired
order by 1
答案 1 :(得分:0)
我会通过为 before 范围创建一个cte来尝试解决此问题,其中一个SCD表具有数据,而另一个没有,并且 overlapping 范围,其中两个表都有数据。同样,如果我遇到这样的情况,即某个表在一段时间后停止记录历史信息,那么我将创建一个 after 范围。
然后采用合并和重叠的cte的并集,我们得到所需的输出。
WITH foo(employee_id, location_id, team_id, date_effective, date_expired) AS ( VALUES
(40,1,6,'2018-01-01'::TIMESTAMP,'2019-03-31'::TIMESTAMP),
(40,2,6,'2019-03-31','9999-12-31')
)
, bar(team_id, manager_id, date_effective, date_expired) AS( VALUES
(6,15,'2018-03-01'::TIMESTAMP,'2018-06-30'::TIMESTAMP),
(6,27,'2018-06-30','9999-12-31')
)
, overlapping AS (
SELECT
team_id
, employee_id
, location_id
, manager_id
, GREATEST(foo.date_effective, bar.date_effective) date_effective
, LEAST(foo.date_expired, bar.date_expired) date_expired
FROM foo JOIN bar USING (team_id)
WHERE tsrange(foo.date_effective, foo.date_expired) && tsrange(bar.date_effective, bar.date_expired)
)
, before AS (
SELECT
team_id
, employee_id
, location_id
, NULL::INTEGER manager_id
, MIN(foo.date_effective) date_effective
, MIN(bar.date_effective) date_expired
FROM foo
LEFT JOIN bar USING (team_id)
GROUP BY team_id, employee_id, location_id
HAVING NOT EXISTS (SELECT FROM overlapping WHERE overlapping.date_effective = MIN(foo.date_effective) AND overlapping.team_id = foo.team_id)
)
SELECT * FROM before
UNION ALL
SELECT * FROM overlapping
ORDER BY 5
这给出了输出:
team_id employee_id location_id manager_id date_effective date_expired
6 40 1 NULL 2018-01-01 2018-03-01
6 40 1 15 2018-03-01 2018-06-30
6 40 1 27 2018-06-30 2019-03-31
6 40 2 27 2019-03-31 9999-12-31