我正在编写脚本以从具有数百万行的数据库中获取数据,并且存在时段间隙的问题。我们已经决定,不应该将不到10天的差距视为差距。因此,应删除这些差距(参见下面的例子。大胆的日期构成了“真正的”感兴趣的时期)
因此,出现了几个问题。第一个问题是确定哪个Outdates和Indates在转换为单个期间彼此接近。下一个问题是将Outdate从较高的行号移动到较低的行号(在表中)。最后一个问题是识别并删除现在重复的行。
我试图解决下面的问题。表#t4a解决了前两个问题。表#t4aa中的策略是通过在新的(虚拟)变量中标记所讨论的重复行来消除重复,并在稍后的阶段中去除所有这些值(1:s)。但是,它不起作用!所有行都标有0,甚至是那些应标记为1的行。任何建议?
- 此临时表测量间隙并创建一个新变量OutDate2,在小间隙(小于11天)的情况下,在行上写下一个Outdate而不是原始值。
WITH C AS (SELECT Id, InDate, OutDate, ROW_NUMBER() OVER (PARTITION BY Id ORDER BY InDate) Rownum FROM #t4 t4)
SELECT cur.Rownum, cur.Id, cur.InDate CurInDate, cur.OutDate, nxt.InDate NxtInDate, DATEDIFF(day, cur.OutDate, nxt.InDate) Number_of_days,
CASE WHEN DATEDIFF(day, cur.OutDate, nxt.InDate)<11 AND DATEDIFF(day, cur.OutDate, nxt.InDate)>0 THEN nxt.OutDate ELSE cur.OutDate END AS OutDate2
INTO #t4a
FROM C cur
LEFT OUTER JOIN C nxt ON (nxt.rownum=cur.rownum+1 AND nxt.Id=cur.Id)
- 此临时表创建一个虚拟对象,用于标识行的OVERLAP,以便在以后的临时表中消除这些行。这个表不起作用。
WITH C AS (SELECT Id, InDate, OutDate, ROW_NUMBER() OVER (PARTITION BY Id ORDER BY InDate) rownum FROM #t4a)
SELECT cur.Id, cur.InDate, nxt.OutDate2,
CASE WHEN cur.OutDate2 < nxt.InDate THEN 1.0 ELSE 0.0
END AS Overlap
INTO #t4aa
FROM C cur
LEFT OUTER JOIN C nxt on (cur.rownum=nxt.rownum+1 AND cur.Id=nxt.Id)
答案 0 :(得分:1)
这是一种概念,但可能会给你一些想法
WITH C AS
(SELECT Id, InDate, OutDate, ROW_NUMBER() OVER (PARTITION BY Id ORDER BY InDate) Rownum FROM #t4 t4)
select Cgood.*
from c
join C as Cgood
on Cgood.ID = C1.ID
and Cgood.Rownum = C.Rownum + 1
and DATEDIFF(day, C.OutDate, nxt.InDate)>=11
group by Cgood.*
union
select Cgood.*
from c
join C as Cgood
on Cgood.ID = C1.ID
and Cgood.Rownum = 1
and C.Rownum = 2
and DATEDIFF(day, C.OutDate, nxt.InDate)>=11
group by Cgood.*
union
select cMerge.ID, c.Indate, cMerge.OutDate
from c
join C as cMerge
on cMerge.ID = C1.ID
and cMerge.Rownum = C.Rownum + 1
and DATEDIFF(day, C.OutDate, cMerge.InDate) < 11
group by cMerge.ID, c.Indate, cMerge.OutDate
union
select cMerge.ID, c.Indate, cMerge.OutDate
from c
join C as cMerge
on cMerge.ID = C1.ID
and cMerge.Rownum = 1
and C.Rownum = 2
and DATEDIFF(day, C.OutDate, cMerge.InDate) < 11
group b
答案 1 :(得分:1)
WITH C AS (SELECT Id, InDate, OutDate, ROW_NUMBER() OVER (PARTITION BY Id ORDER BY InDate) Rownum FROM #t4 t4)
SELECT cur.Rownum, cur.Id, cur.InDate CurInDate, cur.OutDate, nxt.InDate NxtInDate, DATEDIFF(day, cur.OutDate, nxt.InDate) Number_of_days,
CASE
WHEN DATEDIFF(day, prv.OutDate, cur.InDate)<11
AND DATEDIFF(day, prv.OutDate, cur.InDate)>0
THEN 1.0
ELSE 0.0
END AS Overlap,
CASE
WHEN DATEDIFF(day, cur.OutDate, nxt.InDate)<11
AND DATEDIFF(day, cur.OutDate, nxt.InDate)>0
THEN nxt.OutDate
ELSE cur.OutDate
END AS OutDate2
INTO #t4a
FROM C cur
LEFT OUTER JOIN C prv ON (prv.rownum=cur.rownum-1 AND prv.Id=cur.Id)
LEFT OUTER JOIN C nxt ON (nxt.rownum=cur.rownum+1 AND nxt.Id=cur.Id)