根据1列的变化值排名记录

时间:2017-06-03 13:58:03

标签: sql sql-server tsql sql-server-2014

问:如何根据1列的变化值对记录进行排名?

我有以下数据(https://pastebin.com/vdTb1JRT):

EmployeeID  Date        Onleave
ABH12345    2016-01-01  0
ABH12345    2016-01-02  0
ABH12345    2016-01-03  0
ABH12345    2016-01-04  0
ABH12345    2016-01-05  0
ABH12345    2016-01-06  0
ABH12345    2016-01-07  0
ABH12345    2016-01-08  0
ABH12345    2016-01-09  0
ABH12345    2016-01-10  1
ABH12345    2016-01-11  1
ABH12345    2016-01-12  1
ABH12345    2016-01-13  1
ABH12345    2016-01-14  0
ABH12345    2016-01-15  0
ABH12345    2016-01-16  0
ABH12345    2016-01-17  0

我想产生以下结果:

 EmployeeID DateValidFrom    DateValidTo     OnLeave
 ABH12345   2016-01-01       2016-01-09      0
 ABH12345   2016-01-10       2016-01-13      1
 ABH12345   2016-01-14       2016-01-17      0

所以我想我是否可以以某种方式创建一个排名列(如下所示),该列根据Onleave列中的值递增 - 由EmployeeID列分区。

EmployeeID  Date        Onleave    RankedCol
ABH12345    2016-01-01  0          1
ABH12345    2016-01-02  0          1
ABH12345    2016-01-03  0          1
ABH12345    2016-01-04  0          1
ABH12345    2016-01-05  0          1
ABH12345    2016-01-06  0          1
ABH12345    2016-01-07  0          1
ABH12345    2016-01-08  0          1
ABH12345    2016-01-09  0          1
ABH12345    2016-01-10  1          2
ABH12345    2016-01-11  1          2
ABH12345    2016-01-12  1          2
ABH12345    2016-01-13  1          2
ABH12345    2016-01-14  0          3
ABH12345    2016-01-15  0          3
ABH12345    2016-01-16  0          3
ABH12345    2016-01-17  0          3

然后我可以做到以下几点:

SELECT
 [EmployeeID]    = [EmployeeID]
,[DateValidFrom] = MIN([Date])
,[DateValidTo]   = MAX([Date])
,[OnLeave]       = [OnLeave]
FROM table/view/cte/sub-query
GROUP BY 
 [EmployeeID]
,[OnLeave]
,[RankedCol]

其他解决方案非常受欢迎..

以下是测试数据:

WITH CTE AS ( SELECT EmployeeID = 'ABH12345', [Date] = CAST(N'2016-01-01' AS Date), [Onleave] = 0
UNION SELECT 'ABH12345', CAST(N'2016-01-02' AS Date), 0
UNION SELECT 'ABH12345', CAST(N'2016-01-03' AS Date), 0
UNION SELECT 'ABH12345', CAST(N'2016-01-04' AS Date), 0
UNION SELECT 'ABH12345', CAST(N'2016-01-05' AS Date), 0
UNION SELECT 'ABH12345', CAST(N'2016-01-06' AS Date), 0
UNION SELECT 'ABH12345', CAST(N'2016-01-07' AS Date), 0
UNION SELECT 'ABH12345', CAST(N'2016-01-08' AS Date), 0
UNION SELECT 'ABH12345', CAST(N'2016-01-09' AS Date), 0
UNION SELECT 'ABH12345', CAST(N'2016-01-10' AS Date), 1
UNION SELECT 'ABH12345', CAST(N'2016-01-11' AS Date), 1
UNION SELECT 'ABH12345', CAST(N'2016-01-12' AS Date), 1
UNION SELECT 'ABH12345', CAST(N'2016-01-13' AS Date), 1
UNION SELECT 'ABH12345', CAST(N'2016-01-14' AS Date), 0
UNION SELECT 'ABH12345', CAST(N'2016-01-15' AS Date), 0
UNION SELECT 'ABH12345', CAST(N'2016-01-16' AS Date), 0
UNION SELECT 'ABH12345', CAST(N'2016-01-17' AS Date), 0
)

SELECT * FROM CTE

3 个答案:

答案 0 :(得分:3)

使用lag执行此操作的另一种方法。通过获取每个employeeid的先前Onleave值来分配组,并在找到不同的值时重置它。

select employeeid,min(date) as date_from,max(date) as date_to,max(onleave) as onleave
from (select t.*,sum(case when prev_ol=onleave then 0 else 1 end) over(partition by employeeid order by date) as grp
      from (select c.*,lag(onleave,1,onleave) over(partition by employeeid order by date) as prev_ol
            from cte c
           ) t
      ) t
group by employeeid,grp 

答案 1 :(得分:2)

这是群岛问题的一个例子。在这种情况下,您可以使用日期算术。关键的观察是从日期列中减去整数序列可以识别具有相似值的岛。

作为查询,这看起来像:

SELECT EmployeeId, MIN([Date]) as DateValidFrom, MAX([Date]) as DateValidTo,
       OnLeave
FROM (SELECT t.*,
             ROW_NUMBER() OVER (PARTITION BY EmployeeId, OnLeave ORDER BY [Date]) as seqnum
      FROM t
     ) t
GROUP BY EmployeeID, DATEADD(day, - seqnum, [Date]), OnLeave;

您可以运行子查询,查看结果,然后执行算术以查看其工作原理。

这是example

答案 2 :(得分:2)

这是获得所需输出的另一种更简单的方法 - 只访问一次表。

-- sample of data from your question
with t1(EmployeeID, Date1, Onleave) as(
  select 'ABH12345', cast('2016-01-01' as date),  0 union all
  select 'ABH12345', cast('2016-01-02' as date),  0 union all
  select 'ABH12345', cast('2016-01-03' as date),  0 union all
  select 'ABH12345', cast('2016-01-04' as date),  0 union all
  select 'ABH12345', cast('2016-01-05' as date),  0 union all
  select 'ABH12345', cast('2016-01-06' as date),  0 union all
  select 'ABH12345', cast('2016-01-07' as date),  0 union all
  select 'ABH12345', cast('2016-01-08' as date),  0 union all
  select 'ABH12345', cast('2016-01-09' as date),  0 union all
  select 'ABH12345', cast('2016-01-10' as date),  1 union all
  select 'ABH12345', cast('2016-01-11' as date),  1 union all
  select 'ABH12345', cast('2016-01-12' as date),  1 union all
  select 'ABH12345', cast('2016-01-13' as date),  1 union all
  select 'ABH12345', cast('2016-01-14' as date),  0 union all
  select 'ABH12345', cast('2016-01-15' as date),  0 union all
  select 'ABH12345', cast('2016-01-16' as date),  0 union all
  select 'ABH12345', cast('2016-01-17' as date),  0
)
-- actual query
select max(w.employeeid) as employeeid
     , min(w.date1)      as datevalidfrom
     , max(w.date1)      as datevalidto
     , max(w.onleave)    as onleave 
  from (
        select row_number() over(partition by employeeid order by date1) -
               row_number() over(partition by employeeid, onleave order by date1) as grp
             , employeeid
             , date1
             , onleave
          from t1 s
        ) w
group by w.grp
order by employeeid, datevalidfrom

结果:

employeeid datevalidfrom datevalidto onleave
---------- ------------- ----------- -----------
ABH12345   2016-01-01    2016-01-09  0
ABH12345   2016-01-10    2016-01-13  1
ABH12345   2016-01-14    2016-01-17  0