查找重叠的日期范围并从事实表中删除重复项

时间:2018-06-27 15:59:04

标签: sql sql-server

我的下表有重叠的日期,需要从具有这些记录的同一张表和其他事实表中识别并摆脱它。

当前输出为:

select pcode,pkey, RowEffectiveDate ,rowenddate from dimP
where pcode='KO18'

Pcode   Pkey      RowEffectiveDate        rowenddate
KO18    3023    1900-01-01 00:00:00.000  2017-12-06 23:59:59.997
KO18    6328    2017-12-07 00:00:00.000  2018-01-29 23:59:59.997
KO18    8550    2018-01-30 00:00:00.000  2018-02-09 23:59:59.997
KO18    8847    2018-02-10 00:00:00.000  2018-04-24 23:59:59.997
KO18    8848    2018-02-10 00:00:00.000  2018-04-25 23:59:59.997
KO18    8896    2018-02-10 00:00:00.000  2018-04-26 23:59:59.997
KO18    8897    2018-02-10 00:00:00.000  2018-04-29 23:59:59.997
KO18    11506   2018-04-25 00:00:00.000  2018-04-25 23:59:59.997
KO18    11817   2018-04-26 00:00:00.000  2018-04-26 23:59:59.997
KO18    11825   2018-04-27 00:00:00.000  2018-04-29 23:59:59.997
KO18    11849   2018-04-30 00:00:00.000  9999-12-31 00:00:00.000

预期的输出1:确定重叠的Pkey

Pcode Pkeytobeaffected  PkeytobeRetained      RowEffectiveDate        rowenddate
KO18    3023                3023            1900-01-01 00:00:00.000  2017-12-06 23:59:59.997
KO18    6328                6328            2017-12-07 00:00:00.000  2018-01-29 23:59:59.997
KO18    8550                8550            2018-01-30 00:00:00.000  2018-02-09 23:59:59.997
KO18    8847                8847            2018-02-10 00:00:00.000  2018-04-24 23:59:59.997
KO18    8848                8847            2018-02-10 00:00:00.000  2018-04-25 23:59:59.997
KO18    8896                8847            2018-02-10 00:00:00.000  2018-04-26 23:59:59.997
KO18    8897                8847            2018-02-10 00:00:00.000  2018-04-29 23:59:59.997
KO18    11506               11506           2018-04-25 00:00:00.000  2018-04-25 23:59:59.997
KO18    11817               11817           2018-04-26 00:00:00.000  2018-04-26 23:59:59.997
KO18    11825               11825           2018-04-27 00:00:00.000  2018-04-29 23:59:59.997
KO18    11849               11849           2018-04-30 00:00:00.000  9999-12-31 00:00:00.000

预期的输出2:删除重叠的Pkey(我无法通过此查询显示Pkey,这只是尝试显示我需要的sql)

select pcode, Min(RowEffectiveDate) RowEffectiveDate, Min(RowEndDate) RowEndDate
from
(
    select *,
        NewStartDate = t.RowEffectiveDate+v.number,
        NewStartDateGroup =
            dateadd(d,
                    1- DENSE_RANK() over (partition by RowEffectiveDate order by t.RowEffectiveDate+v.number),
                    t.RowEffectiveDate+v.number)
    from dimP t
    inner join master..spt_values v
      on v.type='P' and v.number <= DATEDIFF(d, RowEffectiveDate, RowEndDate)
      where PCode='KO18'
) X
group by PCode,RowEffectiveDate, NewStartDateGroup
order by PCode, RowEffectiveDate

Pcode           Pkey                RowEffectiveDate          RowEndDate
KO18            3023            1900-01-01 00:00:00.000     2017-12-06 23:59:59.997
KO18            6328            2017-12-07 00:00:00.000     2018-01-29 23:59:59.997
KO18            8550            2018-01-30 00:00:00.000     2018-02-09 23:59:59.997
KO18            8847            2018-02-10 00:00:00.000     2018-04-24 23:59:59.997
KO18            11506           2018-04-25 00:00:00.000     2018-04-25 23:59:59.997
KO18            11817           2018-04-26 00:00:00.000     2018-04-26 23:59:59.997
KO18            11825           2018-04-27 00:00:00.000     2018-04-29 23:59:59.997
KO18            11849           2018-04-30 00:00:00.000     9999-12-31 00:00:00.000

预期的输出3:此外,从其他事实表中查找并删除此Pkey。

P.S:RowEffectiveDate应该是RowEndDate的第二天

1 个答案:

答案 0 :(得分:1)

使用first_value()函数:

select *, first_value(Pkey) over (Partition by Pcode, RowEffectiveDate order by RowEndDate) as PkeytobeRetained
from dimP t;