可能重复:
Trying to consolidate employer records who are continuously work for same department
我正在努力整合员工记录,这些记录一直在(任何< 45天)注册到特定部门
注意:如果日期差异(<{>在emp_eff_to_date
与下一行emp_eff_from_date
之间)小于45天,那么它将被视为连续
EMP_ID + DEPT_ID + EMP_EFF_FROM_DATE + EMP_EFF_TO_DATE
-----------------------------------------------------------------------
10 10001 8/1/2008 10/31/2009
10 10001 11/1/2009 2/25/2010
10 10001 2/26/2010 5/1/2011
10 10001 8/1/2011 10/30/2011
10 10001 12/1/2011 10/31/2012
10 10003 7/1/2007 10/31/2007
10 10004 9/27/2004 6/8/2006
10 10004 6/30/2006 6/29/2007
10 10007 6/25/2006 6/20/2007
10 10007 8/25/2007 5/25/2008
EMP_ID DEPT_ID EMP_EFF_FROM_DATE EMP_EFF_TO_DATE
-------------------------------------------------------------------------
10 10001 2008-08-01 2011-05-01
10 10001 2011-08-01 2012-10-31
10 10003 2007-07-01 2007-10-31
10 10004 2004-09-27 2007-06-29
10 10007 2006-06-25 2007-06-20
10 10007 2007-08-25 2007-06-29
答案 0 :(得分:2)
我最近必须做一个非常类似的事情,我的第一个想法是一个递归表表达式,它可以工作,但可能不是最好的解决方案,具体取决于表中的数据量。
目前尚不清楚是否要实际删除数据库中的行,或者只是根据当前记录的需要查看结果。
解决方案1 (SQL Fiddle)
这使用CTE来选择结果。它基本上会找到下一行,其中起始日期是当前行到目前为止的45天内,并保持循环直到没有匹配。完成后,它会查找每个日期(MaxRecursion字段)的最新结果的结果,然后排除属于该行的日期范围内的所有其他行。
WITH CTE AS
( SELECT *, [Recursion] = 0
FROM T
UNION ALL
SELECT T.EMP_ID,
T.DEPT_ID,
T.EMP_EFF_FROM_DATE,
T2.EMP_EFF_TO_DATE,
T.[Recursion] + 1
FROM CTE T
INNER JOIN T T2
ON T.EMP_ID = T.EMP_ID
AND T.DEPT_ID = T2.DEPT_ID
AND T2.EMP_EFF_FROM_DATE > T.EMP_EFF_FROM_DATE
AND T2.EMP_EFF_TO_DATE > T.EMP_EFF_TO_DATE
AND T2.EMP_EFF_FROM_DATE <= DATEADD(DAY, 45, T.EMP_EFF_TO_DATE)
), CTE2 AS
( SELECT *,
[MaxRecursion] = MAX(Recursion) OVER(PARTITION BY EMP_ID, DEPT_ID, EMP_EFF_FROM_DATE)
FROM CTE
)
SELECT T.EMP_ID,
T.DEPT_ID,
T.EMP_EFF_FROM_DATE,
T.EMP_EFF_TO_DATE
FROM CTE2 T
WHERE Recursion = MaxRecursion
AND NOT EXISTS
( SELECT 1
FROM CTE2 T2
WHERE T.EMP_ID = T2.EMP_ID
AND T.DEPT_ID = T2.DEPT_ID
AND T.EMP_EFF_FROM_DATE < T2.EMP_EFF_FROM_DATE
AND T.EMP_EFF_TO_DATE >= T2.EMP_EFF_TO_DATE
)
ORDER BY EMP_ID, DEPT_ID, EMP_EFF_FROM_DATE, EMP_EFF_TO_DATE;
解决方案2 (SQL Fiddle)
这实际上会更新现有行,并删除冗余行,这意味着您只需从表中选择即可获得所需的结果。如果您不想从数据库中实际删除,您只需将数据插入临时表并应用相同的原则(Example here)。在我的情况下,这个解决方案比使用递归CTE运行得快得多,因为在循环的每个阶段,查询处理的数据较少,而不是像递归cte那样处理。
WHILE EXISTS
( SELECT 1
FROM T
INNER JOIN T T2
ON T2.EMP_ID = T.EMP_ID
AND T2.DEPT_ID = T.DEPT_ID
AND T2.EMP_EFF_FROM_DATE > T.EMP_EFF_TO_DATE
AND T2.EMP_EFF_FROM_DATE <= DATEADD(DAY, 45, T.EMP_EFF_TO_DATE)
)
BEGIN
UPDATE T
SET EMP_EFF_TO_DATE = T2.EMP_EFF_TO_DATE
FROM T
INNER JOIN
( SELECT *
FROM T
) T2
ON T2.EMP_ID = T.EMP_ID
AND T2.DEPT_ID = T.DEPT_ID
AND T2.EMP_EFF_FROM_DATE > T.EMP_EFF_TO_DATE
AND T2.EMP_EFF_FROM_DATE <= DATEADD(DAY, 45, T.EMP_EFF_TO_DATE)
DELETE T
FROM T
WHERE EXISTS
( SELECT 1
FROM T T2
WHERE T2.EMP_ID = T.EMP_ID
AND T2.DEPT_ID = T.DEPT_ID
AND T2.EMP_EFF_FROM_DATE < T.EMP_EFF_FROM_DATE
AND T2.EMP_EFF_TO_DATE BETWEEN T.EMP_EFF_FROM_DATE AND T.EMP_EFF_TO_DATE
)
END;
SELECT *
FROM T
ORDER BY EMP_ID, DEPT_ID, EMP_EFF_FROM_DATE;
所有这些解决方案都与最后一行中的示例数据不同,这似乎是一个错误:
我想这一行:
10 10007 2007-08-25 2007-06-29
应该是:
10 10007 2007-08-25 2008-05-25
答案 1 :(得分:1)
假设下一行是根据emp_eff_from_date
字段(已排序),这是一种解决方法:
WITH DATA
AS (SELECT *,
Row_number()
OVER (
PARTITION BY EMP_ID
ORDER BY EMP_EFF_FROM_DATE)rn
FROM TEST)
SELECT t1.*
FROM DATA t1
INNER JOIN DATA t2
ON t1.RN = t2.RN - 1
WHERE Datediff(DAY, t1.EMP_EFF_TO_DATE, t2.EMP_EFF_FROM_DATE) <= 45
完整的解决方案是here
如果不是你想要的话,请告诉我。