我正在尝试编写一个识别日期组的函数,并测量组的大小。
到目前为止,我一直在使用Python进行程序化处理,但我想把它转移到SQL中。
例如,列表
Bill 01/01/2011
Bill 02/01/2011
Bill 03/01/2011
Bill 05/01/2011
Bill 07/01/2011
应将输出到新表中:
Bill 01/01/2011 3
Bill 02/01/2011 3
Bill 03/01/2011 3
Bill 05/01/2011 1
Bill 07/01/2011 1
理想情况下,这也应该能够解释周末和公众假期 - 我的表格中的日期将是周一至周五(我想我可以通过制作新的工作日表并按顺序编号来解决这个问题)。有人在工作,建议我尝试CTE。我是新手,所以我很感激任何人都可以提供的任何指导!感谢。
答案 0 :(得分:15)
您可以通过巧妙的窗口功能应用程序来完成此操作。请考虑以下事项:
select name, date, row_number() over (partition by name order by date)
from t
这会添加一个行号,在您的示例中只会是1,2,3,4,5。现在,从日期中获取差异,并为该组保留一个常量值。
select name, date,
dateadd(d, - row_number() over (partition by name order by date), date) as val
from t
最后,您需要按顺序排列的组数。我还会添加一个组标识符(例如,区分最后两个)。
select name, date,
count(*) over (partition by name, val) as NumInSeq,
dense_rank() over (partition by name order by val) as SeqID
from (select name, date,
dateadd(d, - row_number() over (partition by name order by date), date) as val
from t
) t
不知何故,我错过了关于平日和假期的部分。该解决方案无法解决该问题。
答案 1 :(得分:7)
以下查询帐户是周末和假日。查询有一个规定,可以即时包含假期,但为了使查询更清晰,我只是将假期具体化为实际的表格。
CREATE TABLE tx
(n varchar(4), d date);
INSERT INTO tx
(n, d)
VALUES
('Bill', '2006-12-29'), -- Friday
-- 2006-12-30 is Saturday
-- 2006-12-31 is Sunday
-- 2007-01-01 is New Year's Holiday
('Bill', '2007-01-02'), -- Tuesday
('Bill', '2007-01-03'), -- Wednesday
('Bill', '2007-01-04'), -- Thursday
('Bill', '2007-01-05'), -- Friday
-- 2007-01-06 is Saturday
-- 2007-01-07 is Sunday
('Bill', '2007-01-08'), -- Monday
('Bill', '2007-01-09'), -- Tuesday
('Bill', '2012-07-09'), -- Monday
('Bill', '2012-07-10'), -- Tuesday
('Bill', '2012-07-11'); -- Wednesday
create table holiday(d date);
insert into holiday(d) values
('2007-01-01');
/* query should return 7 consecutive good
attendance(from December 29 2006 to January 9 2007) */
/* and 3 consecutive attendance from July 7 2012 to July 11 2012. */
查询:
with first_date as
(
-- get the monday of the earliest date
select dateadd( ww, datediff(ww,0,min(d)), 0 ) as first_date
from tx
)
,shifted as
(
select
tx.n, tx.d,
diff = datediff(day, fd.first_date, tx.d)
- (datediff(day, fd.first_date, tx.d)/7 * 2)
from tx
cross join first_date fd
union
select
xxx.n, h.d,
diff = datediff(day, fd.first_date, h.d)
- (datediff(day, fd.first_date, h.d)/7 * 2)
from holiday h
cross join first_date fd
cross join (select distinct n from tx) as xxx
)
,grouped as
(
select *, grp = diff - row_number() over(partition by n order by d)
from shifted
)
select
d, n, dense_rank() over (partition by n order by grp) as nth_streak
,count(*) over (partition by n, grp) as streak
from grouped
where d not in (select d from holiday) -- remove the holidays
输出:
| D | N | NTH_STREAK | STREAK |
-------------------------------------------
| 2006-12-29 | Bill | 1 | 7 |
| 2007-01-02 | Bill | 1 | 7 |
| 2007-01-03 | Bill | 1 | 7 |
| 2007-01-04 | Bill | 1 | 7 |
| 2007-01-05 | Bill | 1 | 7 |
| 2007-01-08 | Bill | 1 | 7 |
| 2007-01-09 | Bill | 1 | 7 |
| 2012-07-09 | Bill | 2 | 3 |
| 2012-07-10 | Bill | 2 | 3 |
| 2012-07-11 | Bill | 2 | 3 |
实时测试:http://www.sqlfiddle.com/#!3/815c5/1
查询的主要逻辑是将所有日期转移回两天。这是通过将日期除以7并将其乘以2,然后从原始数字中减去它来完成的。例如,如果给定日期是15日,则计算为15/7 * 2 == 4;然后从原始数字中减去4,15 - 4 == 11. 15将成为第11天。同样,第8天成为第6天; 8 - (8/7 * 2)== 6.
Weekends are not in attendance(e.g. 6,7,13,14)
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15
将计算应用于所有工作日数字将产生以下值:
1 2 3 4 5
6 7 8 9 10
11
对于假期,你需要在出席时插入它们,因此可以很容易地确定连续性,然后从最终查询中删除它们。上述出席率连续11次出席。
查询逻辑的详细解释如下:http://www.ienablemuch.com/2012/07/monitoring-perfect-attendance.html