使用分组子句时如何避免重叠日期范围?

时间:2011-01-06 21:28:53

标签: sql tsql

我有一种情况,我需要在价值变化之间找到时间跨度。我尝试了一个简单的group by子句,但它消除了重叠的更改。请考虑以下示例:

create table #items (
      code varchar(4)
    , class varchar(4)
    , txdate datetime
)

insert into #items (code, class, txdate) values ('A', 'C', '2010-01-01');
insert into #items (code, class, txdate) values ('A', 'C', '2010-01-02');
insert into #items (code, class, txdate) values ('A', 'C', '2010-01-03');
insert into #items (code, class, txdate) values ('A', 'D', '2010-01-04');
insert into #items (code, class, txdate) values ('A', 'D', '2010-01-05');
insert into #items (code, class, txdate) values ('A', 'C', '2010-01-06');
insert into #items (code, class, txdate) values ('A', 'C', '2010-01-07');
insert into #items (code, class, txdate) values ('A', 'D', '2010-01-08');
insert into #items (code, class, txdate) values ('A', 'D', '2010-01-09');

select code
, class
, min(txdate) mindate
, max(txdate) maxdate
from #items
group by code, class

返回以下结果(注意重叠的日期范围):

|code|class|mindate   |maxdate   |
----------------------------------
|A   |C    |2010-01-01|2010-01-07|
|A   |D    |2010-01-04|2010-01-09|

我想让查询返回以下内容:

|code|class|mindate   |maxdate   |
----------------------------------
|A   |C    |2010-01-01|2010-01-03|
|A   |D    |2010-01-04|2010-01-05|
|A   |C    |2010-01-06|2010-01-07|
|A   |D    |2010-01-08|2010-01-09|

有任何想法和建议吗?

4 个答案:

答案 0 :(得分:2)

编辑:正如评论中指出的那样,这仍然不太正确。

;with cteNtile as (
    select code, class, txdate, 
           ntile((select count(*) from (select NULL as dummy from #items group by code, class) a)) over(partition by code, class order by txdate) as tilenum
        from #items
)
select code, class, MIN(txdate) as mindate, MAX(txdate) as maxdate
    from cteNtile
    group by code, class, tilenum
    order by mindate, maxdate

答案 1 :(得分:2)

以下是可以为您提供所需结果的查询。

;WITH items1 AS (
SELECT ROW_NUMBER() OVER (ORDER BY txdate) rowid, code, class, txdate
from #items
),
items2 AS (
SELECT ROW_NUMBER() OVER (ORDER BY rowid) id, rowid, i1.Code, i1.Class, i1.txdate
FROM items1 i1
WHERE NOT EXISTS (SELECT 1 FROM items1 i2
                  WHERE i2.txdate < i1.txdate
                  AND i2.class = i1.class
                  AND i2.Code = i1.Code 
                  AND i2.rowid+1=i1.rowid)
)
SELECT items2.code, items2.class, items2.txdate mindate, items1.txdate maxdate
FROM items2,  items2 items3, items1 
WHERE (items2.id+1=items3.id AND items3.rowid-1=items1.rowid)
OR items2.rowid = (SELECT MAX(t.rowid) FROM items1 t)
UNION 
SELECT items2.code, items2.class, MAX(items2.txdate) mindate, MAX(items1.txdate) maxdate
FROM items2, items1 
WHERE items1.class = items2.class
GROUP BY items1.class, items2.class, items2.code, items2.class
ORDER BY items2.txdate

答案 2 :(得分:0)

我认为你不能用一个简单的选择语句来做到这一点。

您可以使用游标迭代行并识别“类”更改。

答案 3 :(得分:0)

按照@KM的建议研究SQL SERVER ISLANDS之后,我提出了以下查询,当其他类代码添加到数据集时,它似乎运行良好。

select a.code, a.class, a.txdate as mindate, b.txdate as maxdate
from (
    --Find minimum island
    select code
        , class
        , txdate
        , row_number() over (order by code, class, txdate) as n
    from #items tb1
    where not exists (
        select *
        from #items tb2
        where datediff(d, tb1.txdate, tb2.txdate) = -1      
          and tb1.class = tb2.class
          and tb1.code = tb2.code
    ) 
) as a
inner join (
    --Find maximum island 
    select code
        , class
        , txdate
        , row_number() over (order by code, class, txdate) as n
    from #items tb1
    where not exists (
        select *
        from #items tb2
        where datediff(d, tb1.txdate, tb2.txdate) = 1   
          and tb1.class = tb2.class
          and tb1.code = tb2.code
    ) 
) as b on a.n = b.n

这种方法唯一需要注意的是,最小集合中的条目数量需要与最大集合中的条目数量相匹配。到目前为止,我还没有做任何会导致这种情况不正确的事情。但是,我没有测试空值或性能。