如果两个值在同一个月内,则添加一个月,然后重复验证检查(递归?)

时间:2010-10-01 12:45:12

标签: sql sql-server sql-server-2005 tsql

我正在查看一些相当狡猾的数据,我的任务是找出连续几个月的情况。但是我已经注意到在我的一些参考表中,个别日期值被错误编码,并且在一个月内有多个值的情况下,最新的值应该被添加到下个月。

例如:

Contact _id  | Original Payment Date 
id001        |  02/07/2003
id001        |      30/07/2003 --should be changed to 30/08/2003
id001        |      01/09/2003
id001        |      01/10/2003 
id001        |      30/10/2003 -- should be changed to 30/11/2003
id001        |      02/12/2003
id001        |      31/12/2003 -- should be changed to 31/01/2004
id001        |      30/01/2004
id001        |      03/03/2004

然而,我用简单的DATEADD函数找到了两个问题:

1)如果付款需要移入下个月并且日期大于下个月允许(即31/01/2003不能进入2003年2月31日)我不知道日期是怎么回事在这种情况下会起作用

2)如果在上面的例子中我进行了这些更改,我们有以下数据:

id001        |      02/07/2003
id001        |      30/08/2003
id001        |      01/09/2003
id001        |      01/10/2003
id001        |      30/11/2003
id001        |      02/12/2003
id001        |      31/01/2004 --This should now be changed to a value in Februrary 2004 as there are
 now duplicates in January 2004 created by the previous amendment
 id001       |     30/01/2004
id001        |      03/03/2004

虽然我相信2'循环'的变化会确保所有数据都是正确的我不能确定,所以我真的想要一些推动一个月最新日期的方法(其中有2个值)月)转发一个月,如果可能的话,重复一直到没有重复。

我使用的是sql server 2005,参考表有大约2000万行,所以如果可能的话我宁愿不使用游标:)

谢谢!

更新 我用来第一次更新日期的脚本是这样的:

;WITH cte1 AS (  
SELECT  contact_id
        ,value_net
        ,DATEPART(YEAR, date_received)*12 + DATEPART(MONTH, date_received) -  
        DENSE_RANK() OVER  
                   (PARTITION BY contact_id  
                    ORDER BY DATEPART(YEAR, date_received)*12 + DATEPART(MONTH, date_received)) AS dategroup
        ,DENSE_RANK() OVER  
                   (PARTITION BY contact_id  
                    ORDER BY DATEPART(YEAR, date_received)*12 + DATEPART(MONTH, date_received)) AS rnk
        ,ROW_NUMBER() OVER  
                   (PARTITION BY contact_id  
                    ORDER BY DATEPART(YEAR, date_received)*12 + DATEPART(MONTH, date_received)) AS rnk2

        ,date_received 
FROM    donation with (nolock)
WHERE contact_id IS NOT NULL
 )
,cte2 AS
(
SELECT 
c1.contact_id
,c1.value_net
,c1.dategroup
,CASE WHEN c1.rnk = c2.rnk AND c1.rnk2 > c2.rnk2 THEN DATEADD(MM,+1,c1.date_received) ELSE c1.date_received END as date_received
from cte1 c1
LEFT OUTER JOIN cte1 c2 WITH (nolock) ON c2.contact_id = c1.contact_id AND c2.rnk = c1.rnk AND c2.rnk2 = c1.rnk2-1
)

1 个答案:

答案 0 :(得分:1)

1)DATEADD(),MONTH为1到2010-01-31 will result in 2010-02-28

2)问题是重复并在重复项中识别特定行。例如,你想要转移除第一个之外的所有,但是如果首先有多个,它们将不会被移位 - 即你正在做这样的事情:

UPDATE dates
SET dt = DATEADD(MONTH, 1, dt)
WHERE YEAR(dt) * 100 + MONTH(dt) IN (
    SELECT YEAR(dt) * 100 + MONTH(dt)
    FROM dates
    GROUP BY YEAR(dt), MONTH(dt)
    HAVING COUNT(*) > 1
)
AND dt NOT IN (
    SELECT MIN(dt)
    FROM dates
    GROUP BY YEAR(dt), MONTH(dt)
    HAVING COUNT(*) > 1
)

你可以修改它以使用ROW_NUMBER()OVER()来唯一地标识行。