我正在查看一些相当狡猾的数据,我的任务是找出连续几个月的情况。但是我已经注意到在我的一些参考表中,个别日期值被错误编码,并且在一个月内有多个值的情况下,最新的值应该被添加到下个月。
例如:
Contact _id | Original Payment Date
id001 | 02/07/2003
id001 | 30/07/2003 --should be changed to 30/08/2003
id001 | 01/09/2003
id001 | 01/10/2003
id001 | 30/10/2003 -- should be changed to 30/11/2003
id001 | 02/12/2003
id001 | 31/12/2003 -- should be changed to 31/01/2004
id001 | 30/01/2004
id001 | 03/03/2004
然而,我用简单的DATEADD函数找到了两个问题:
1)如果付款需要移入下个月并且日期大于下个月允许(即31/01/2003不能进入2003年2月31日)我不知道日期是怎么回事在这种情况下会起作用
2)如果在上面的例子中我进行了这些更改,我们有以下数据:
id001 | 02/07/2003
id001 | 30/08/2003
id001 | 01/09/2003
id001 | 01/10/2003
id001 | 30/11/2003
id001 | 02/12/2003
id001 | 31/01/2004 --This should now be changed to a value in Februrary 2004 as there are
now duplicates in January 2004 created by the previous amendment
id001 | 30/01/2004
id001 | 03/03/2004
虽然我相信2'循环'的变化会确保所有数据都是正确的我不能确定,所以我真的想要一些推动一个月最新日期的方法(其中有2个值)月)转发一个月,如果可能的话,重复一直到没有重复。
我使用的是sql server 2005,参考表有大约2000万行,所以如果可能的话我宁愿不使用游标:)
谢谢!
更新 我用来第一次更新日期的脚本是这样的:
;WITH cte1 AS (
SELECT contact_id
,value_net
,DATEPART(YEAR, date_received)*12 + DATEPART(MONTH, date_received) -
DENSE_RANK() OVER
(PARTITION BY contact_id
ORDER BY DATEPART(YEAR, date_received)*12 + DATEPART(MONTH, date_received)) AS dategroup
,DENSE_RANK() OVER
(PARTITION BY contact_id
ORDER BY DATEPART(YEAR, date_received)*12 + DATEPART(MONTH, date_received)) AS rnk
,ROW_NUMBER() OVER
(PARTITION BY contact_id
ORDER BY DATEPART(YEAR, date_received)*12 + DATEPART(MONTH, date_received)) AS rnk2
,date_received
FROM donation with (nolock)
WHERE contact_id IS NOT NULL
)
,cte2 AS
(
SELECT
c1.contact_id
,c1.value_net
,c1.dategroup
,CASE WHEN c1.rnk = c2.rnk AND c1.rnk2 > c2.rnk2 THEN DATEADD(MM,+1,c1.date_received) ELSE c1.date_received END as date_received
from cte1 c1
LEFT OUTER JOIN cte1 c2 WITH (nolock) ON c2.contact_id = c1.contact_id AND c2.rnk = c1.rnk AND c2.rnk2 = c1.rnk2-1
)
答案 0 :(得分:1)
1)DATEADD(),MONTH为1到2010-01-31 will result in 2010-02-28
2)问题是重复并在重复项中识别特定行。例如,你想要转移除第一个之外的所有,但是如果首先有多个,它们将不会被移位 - 即你正在做这样的事情:
UPDATE dates
SET dt = DATEADD(MONTH, 1, dt)
WHERE YEAR(dt) * 100 + MONTH(dt) IN (
SELECT YEAR(dt) * 100 + MONTH(dt)
FROM dates
GROUP BY YEAR(dt), MONTH(dt)
HAVING COUNT(*) > 1
)
AND dt NOT IN (
SELECT MIN(dt)
FROM dates
GROUP BY YEAR(dt), MONTH(dt)
HAVING COUNT(*) > 1
)
你可以修改它以使用ROW_NUMBER()OVER()来唯一地标识行。