Question

I have a query that returns results like the below:

RowID IP         datetime1         datetime2     temp_violation
   ---------------------------------------------------------------
   1     'A'        '1-1-19'          '1-2-19'      0
   2     'A'        '1-2-19'          '1-3-19'      0
   3     'A'        '1-3-19'          '1-4-19'      0
   4     'A'        '1-4-19'          '1-5-19'      1
   5     'A'        '1-5-19'          '1-6-19'      1
   6     'A'        '1-6-19'          '1-7-19'      1
   7     'A'        '1-7-19'          '1-8-19'      0
   8     'A'        '1-8-19'          '1-9-19'      0
   9     'A'        '1-9-19'          '1-10-19'     0
   10    'B'        '1-1-19'          '1-2-19'      0
   11    'B'        '1-2-19'          '1-3-19'      0
   12    'B'        '1-3-19'          '1-4-19'      0
   13    'B'        '1-4-19'          '1-5-19'      1
   14    'B'        '1-5-19'          '1-6-19'      1
   15    'B'        '1-6-19'          '1-7-19'      1
   16    'B'        '1-7-19'          '1-8-19'      0
   17    'B'        '1-8-19'          '1-9-19'      0
   18    'B'        '1-9-19'          '1-10-19'     0

For each IP, I need to return a result set like this:

   RowID IP         datetime1         datetime2     temp_violation  groupnum
   -------------------------------------------------------------------------
   1     'A'        '1-1-19'          '1-2-19'      0               1
   2     'A'        '1-2-19'          '1-3-19'      0               1
   3     'A'        '1-3-19'          '1-4-19'      0               1
   4     'A'        '1-4-19'          '1-5-19'      1               2
   5     'A'        '1-5-19'          '1-6-19'      1               2
   6     'A'        '1-6-19'          '1-7-19'      1               2
   7     'A'        '1-7-19'          '1-8-19'      0               3
   8     'A'        '1-8-19'          '1-9-19'      0               3
   9     'A'        '1-9-19'          '1-10-19'     0               3
   10    'B'        '1-1-19'          '1-2-19'      0               1
   11    'B'        '1-2-19'          '1-3-19'      0               1
   12    'B'        '1-3-19'          '1-4-19'      0               1
   13    'B'        '1-4-19'          '1-5-19'      1               2
   14    'B'        '1-5-19'          '1-6-19'      1               2
   15    'B'        '1-6-19'          '1-7-19'      1               2
   16    'B'        '1-7-19'          '1-8-19'      0               3
   17    'B'        '1-8-19'          '1-9-19'      0               3
   18    'B'        '1-9-19'          '1-10-19'     0               3

So for example: for IP A the violations change from 0/0/0 to 1/1/1 to 0/0/0, so the query needs to recognize the first 0/0/0 as group 1, then recognize 1/1/1 as group 2, and finally the third 0/0/0 as group 3.

For the rows for IP B, I've restarted the numbering again from 1, but it doesn't need to restart - it could have labeled the first group as group 4, the next one as group 5, and the next one as group 6. The only thing that matters is that for each IP and for each similar consecutive values of temp_violation that the group numbers are unique. The tricky part here is that I don't want to loop through every row because there are potentially millions of rows and I'm not well versed with CTEs (I don't even know if they will help here). I tried a bunch of stuff with the row_number(), rank(), dense_rank(), and ntile() but I couldn't find a clever way to use these to achieve this.

Answer 1

This is a gaps-and-islands problem. The simplest method might be lag() and a cumulative sum:

select t.*,
       sum(case when temp_violation = prev_tv then 0 else 1 end) over (partition by id order by rowid) as groupnum
from (select t.*,
             lag(temp_violation) over (partition by id order by rowid) as prev_tv
      from t
     ) t;

Oops, I noticed you are using SQL Server 2008, so you don't have lag(). In this case, the difference of row numbers is the better approach:

select t.*,
       dense_rank() over (partition by id order by min_rowid) as groupnum
from (select t.*,
             min(rowid) over (partition by id, temp_violation, seqnum - seqnum_2) as min_rowid
      from (select t.*,
                   row_number() over (partition by id order by rowid) as seqnum,
                   row_number() over (partition by id, temp_violation order by rowid) as seqnum_2
            from t
           ) t
     ) t;

How can I create sequential group numbers in a query?

1 个答案: