使用SQL Server出现3列的空白和孤岛失败

时间:2018-07-03 16:02:50

标签: sql sql-server gaps-and-islands gaps-in-data

我在间隙和孤岛解决方案上遇到了奇怪的行为。使用3列(第3列为非整数),结果实际上是随机的。假设我们执行以下查询:

Declare @Table1 TABLE
(
    ID varchar(50), 
    yr float, 
    CO1 varchar(50)
);

INSERT INTO @Table1 (ID, yr, CO1)
VALUES ('I2','2011','ABE'), ('I2','2012','ABE'), ('I2','2013','ABE'),
       ('I2','2014','ABE'), ('I2','2014','ABE'), ('I2','2005','ABD'),
       ('I2','2006','ABD'), ('I2','2007','ABD'), ('I2','2008','ABD'),
       ('I2','2007','ABA CD'), ('I2','2011','ABA CD'), ('I2','2013','ABA CD');

SELECT 
    ID, CO1, StartSeqNo = MIN(yr), EndSeqNo = MAX(yr)
FROM 
    (SELECT 
         ID, yr, CO1,
         rn = yr - ROW_NUMBER() OVER (PARTITION BY ID ORDER BY yr)
     FROM 
         @Table1) a
GROUP BY 
    ID, CO1, rn ;

我想要的结果是:

ID  CO1    StartSeqNo   EndSeqNo
----------------------------
I2  ABA CD    2007       2007
I2  ABA CD    2011       2011
I2  ABA CD    2013       2013
I2  ABD       2005       2008
I2  ABE       2011       2014

我已经检查了stackoverflow和其他地方,以确定是否丢失了某些东西。我已经尝试过distinct和density_rank,但都没有给出正确的结果

以下是我已经尝试过的不同且密集的查询:

--- distinct 

SELECT distinct ID,CO1, StartSeqNo=MIN(yr), EndSeqNo=MAX(yr)
FROM (
    SELECT distinct ID, yr, CO1
        ,rn=yr-ROW_NUMBER() OVER (PARTITION BY ID ORDER BY yr)
    FROM @Table1) a
GROUP BY ID, CO1, rn ;

--- with dense_rank
SELECT ID,CO1, StartSeqNo=MIN(yr), EndSeqNo=MAX(yr)
FROM (
    SELECT ID, yr, CO1
        ,rn=yr-dense_rank() OVER (PARTITION BY ID ORDER BY yr)
    FROM @Table1) a
GROUP BY ID, CO1, rn ;

我不明白为什么间隔和孤岛查询不能与非整数列一起使用。我认为在某处进行分组存在问题。请帮我解决一下这个。

Sim

3 个答案:

答案 0 :(得分:1)

您需要DENSE_RANK,因为您有多个具有相同ID /年组合的行,并且需要将CO1添加到PARTITION BY

SELECT 
    ID, CO1, StartSeqNo = MIN(yr), EndSeqNo = MAX(yr)
FROM 
    (SELECT 
         ID, yr, CO1,
         rn = yr - dense_rank() OVER (PARTITION BY ID, CO1 ORDER BY yr)
     FROM 
         @Table1) a
GROUP BY 
    ID, CO1, rn ;

答案 1 :(得分:0)

您似乎想要:

select id, co1, min(yr), max(yr)
from (select *, (case when max(grp) over(partition by co1) > 1 then grp else 1 end) as grp1
      from (select *, yr - lag(yr, 1, yr) over (partition by id, co1 order by yr) as grp
            from table
           ) t
       ) t
group by id, co1, grp1;

答案 2 :(得分:0)

在没有间隔的情况下,年份将是每个ID / CO1组中的连续编号,您可以将其与无间隙编号进行比较,当然,对于按年份订购的每个ID / CO1,编号也必须是连续的。因此,如果不对CO1进行排序(在一年之前),则还必须在行编号功能中对分区使用CO1。 另外,您的数据包含重复的行,因此要在ID / CO1组中给相同的年份相同的数字,请使用RANK函数而不是ROW_NUMBER:

WITH a (ID, CO1, yr, nmbr) AS (
  SELECT ID, CO1, yr
    , yr - RANK() OVER (PARTITION BY ID, CO1 ORDER BY yr)
  FROM @Table1
)
SELECT ID, CO1, StartSeqNo = MIN(yr), EndSeqNo = MAX(yr)
FROM a
GROUP BY ID, CO1, nmbr;

最后,我建议对年份数字使用int而不是float。