合并连续的重复记录,包括时间范围

时间:2018-05-11 10:47:20

标签: db2

我对此处提出的问题有一个非常类似的问题:Merge duplicate temporal records in database

这里的区别是,我需要结束日期是实际日期而不是NULL。

所以给出以下数据:

EmployeeId   StartDate   EndDate     Column1   Column2
1000         2009/05/01  2010/04/30   X         Y
1000         2010/05/01  2011/04/30   X         Y
1000         2011/05/01  2012/04/30   X         X
1000         2012/05/01  2013/04/30   X         Y
1000         2013/05/01  2014/04/30   X         X
1000         2014/05/01  2014/06/01   X         X

期望的结果是:

EmployeeId   StartDate   EndDate     Column1   Column2
1000         2009/05/01  2011/04/30   X         Y
1000         2011/05/01  2012/04/30   X         X
1000         2012/05/01  2013/04/30   X         Y
1000         2013/05/01  2014/06/01   X         X

链接线程中提出的解决方案是:

with  t1 as  --tag first row with 1 in a continuous time series
(
select t1.*, case when t1.column1=t2.column1 and t1.column2=t2.column2
                  then 0 else 1 end as tag
  from test_table t1
  left join test_table t2
    on t1.EmployeeId= t2.EmployeeId and dateadd(day,-1,t1.StartDate)= t2.EndDate
)
select t1.EmployeeId, t1.StartDate, 
       case when min(T2.StartDate) is null then null
            else dateadd(day,-1,min(T2.StartDate)) end as EndDate,
       t1.Column1, t1.Column2
  from (select t1.* from t1 where tag=1 ) as t1  -- to get StartDate
  left join (select t1.* from t1 where tag=1 ) as t2  -- to get a new EndDate
    on t1.EmployeeId= t2.EmployeeId and t1.StartDate < t2.StartDate
 group by t1.EmployeeId, t1.StartDate, t1.Column1,   t1.Column2;

但是,当您需要结束日期而不是NULL时,这似乎不起作用。

有人可以帮我解决这个问题吗?

1 个答案:

答案 0 :(得分:0)

这是另一种解决方案(取自How do I group on continuous ranges)。编码更简单,并且还满足NULL值(即,与简单的LAG()比较不同,处理NULL = NULL)。但是,由于GROUP BY

,对于大量数据可能效率不高
SELECT EmployeeId
,      MIN(StartDate) AS StartDate
,      MAX(EndDate)   AS EndDate
,      Column1 
,      Column2
FROM
(
    SELECT t.*
    ,      ROW_NUMBER() OVER(PARTITION BY EmployeeId, Column1, Column2 ORDER BY StartDate ) AS GRN
    ,      ROW_NUMBER() OVER(PARTITION BY EmployeeId                   ORDER BY StartDate ) AS RN
    FROM 
           test_table t
    ) t
GROUP BY
       EmployeeId
,      Column1 
,      Column2
,      RN - GRN