岛屿和差距sql

时间:2012-08-16 09:04:16

标签: sql-server sql-server-2008 tsql gaps-and-islands

我一直在努力解决一个应该非常简单的问题但经过一整周的阅读,谷歌搜索,试验等等,我的同事和我们找不到合适的解决方案。 :(

问题:我们有一个包含两个值的表: 雇员人数(P_ID,int)< ---雇员的身份证明 日期(开始时间,日期时间)< ---时间员工签到

  • 我们需要知道每位员工的工作时间。
  • 当两个日期少于@gap天时,它们属于同一时期
  • 对于每个员工,任何一天都可以有多个记录,但我只需要知道他工作的日期,我对时间部分不感兴趣
  • 只要有间隙> @gap天,下一个日期被视为新范围的开始
  • 范围至少为1天(例如:21-9-2011 | 21-09-2011),但没有最大长度。 (每个@gap - 1天办理登机手续的员工应该从登记入住的第一天到今天为止。

我们认为我们需要的是这个表中的岛屿,其中天数的差距大于@variable(@gap = 30表示30天)

这是一个例子:

sourceTable会:

P_ID  | starttime
------|------------------
12121 | 24-03-2009 7:30
12121 | 24-03-2009 14:25 
12345 | 27-06-2011 10:00
99999 | 01-05-2012 4:50 
12345 | 27-06-2011 10:30
12345 | 28-06-2011 11:00
98765 | 13-04-2012 10:00
12345 | 21-07-2011 9:00
99999 | 03-05-2012 23:15
12345 | 21-09-2011 12:00
45454 | 12-07-2010 8:00
12345 | 21-09-2011 17:00
99999 | 06-05-2012 11:05
99999 | 20-05-2012 12:45
98765 | 26-04-2012 16:00
12345 | 07-07-2012 14:00
99999 | 01-06-2012 13:55
12345 | 13-08-2012 13:00

现在我需要的是:

时期:

P_ID  |   Start    |    End
-------------------------------
12121 | 24-03-2009 | 24-03-2009
12345 | 27-06-2012 | 21-07-2012
12345 | 21-09-2012 | 21-09-2012
12345 | 07-07-2012 | (today) OR 13-08-2012  <-- (less than @gap days ago) OR (last date in table)
45454 | 12-07-2010 | 12-07-2010
45454 | 17-06-2012 | 17-06-2012 
98765 | 13-04-2012 | 26-04-2012
99999 | 01-05-2012 | 01-06-2012

我希望这一点很清楚,我已经感谢你阅读这篇文章了,如果你能做出贡献那就太棒了!

2 个答案:

答案 0 :(得分:1)

我做了一个粗略的脚本,可以帮助你入门。没有打扰改进日期时间,并且端点比较可能需要调整。

select 
    P_ID,
    src.starttime,
    endtime = case when src.starttime <> lst.starttime or lst.starttime < DATEADD(dd,-1 * @gap,GETDATE()) then lst.starttime else GETDATE() end,
    frst.starttime,
    lst.starttime
from @SOURCETABLE src
outer apply (select starttime = MIN(starttime) from @SOURCETABLE sub where src.p_id = sub.p_id and sub.starttime > DATEADD(dd,-1 * @gap,src.starttime)) frst
outer apply (select starttime = MAX(starttime) from @SOURCETABLE sub where src.p_id = sub.p_id and src.starttime > DATEADD(dd,-1 * @gap,sub.starttime)) lst
where src.starttime = frst.starttime
order by P_ID, src.starttime

我得到以下输出,这与你的不同,但我认为没问题:

P_ID        starttime               endtime                 starttime               starttime
----------- ----------------------- ----------------------- ----------------------- -----------------------
12121       2009-03-24 07:30:00.000 2009-03-24 14:25:00.000 2009-03-24 07:30:00.000 2009-03-24 14:25:00.000
12345       2011-06-27 10:00:00.000 2011-07-21 09:00:00.000 2011-06-27 10:00:00.000 2011-07-21 09:00:00.000
12345       2011-09-21 12:00:00.000 2011-09-21 17:00:00.000 2011-09-21 12:00:00.000 2011-09-21 17:00:00.000
12345       2012-07-07 14:00:00.000 2012-07-07 14:00:00.000 2012-07-07 14:00:00.000 2012-07-07 14:00:00.000
12345       2012-08-13 13:00:00.000 2012-08-16 11:23:25.787 2012-08-13 13:00:00.000 2012-08-13 13:00:00.000
45454       2010-07-12 08:00:00.000 2010-07-12 08:00:00.000 2010-07-12 08:00:00.000 2010-07-12 08:00:00.000
98765       2012-04-13 10:00:00.000 2012-04-26 16:00:00.000 2012-04-13 10:00:00.000 2012-04-26 16:00:00.000

最后两个输出列是outer apply部分的结果,只是用于调试。

这基于以下设置:

declare @gap int
set @gap = 30

set dateformat dmy
-----P_ID----|----starttime----
declare @SOURCETABLE table (P_ID int, starttime datetime)
insert @SourceTable values 
(12121,'24-03-2009 7:30'),
(12121,'24-03-2009 14:25'),
(12345,'27-06-2011 10:00'),
(12345,'27-06-2011 10:30'),
(12345,'28-06-2011 11:00'),
(98765,'13-04-2012 10:00'),
(12345,'21-07-2011 9:00'),
(12345,'21-09-2011 12:00'),
(45454,'12-07-2010 8:00'),
(12345,'21-09-2011 17:00'),
(98765,'26-04-2012 16:00'),
(12345,'07-07-2012 14:00'),
(12345,'13-08-2012 13:00')

更新:重新思考一下。现在使用CTE计算每个项目前后的差距,然后聚合:

--Get the gap between each starttime and the next and prev (use 999 to indicate non-closed intervals)
;WITH CTE_Gaps As ( 
    select
        p_id,
        src.starttime,
        nextgap = coalesce(DATEDIFF(dd,src.starttime,nxt.starttime),999), --Gap to the next entry
        prevgap = coalesce(DATEDIFF(dd,prv.starttime,src.starttime),999), --Gap to the previous entry
        isold = case when DATEDIFF(dd,src.starttime,getdate()) > @gap then 1 else 0 end --Is starttime more than gap days ago?
    from
        @SOURCETABLE src
        cross apply (select starttime = MIN(starttime) from @SOURCETABLE sub where src.p_id = sub.p_id and sub.starttime > src.starttime) nxt
        cross apply (select starttime = max(starttime) from @SOURCETABLE sub where src.p_id = sub.p_id and sub.starttime < src.starttime) prv   
)
--select * from CTE_Gaps
select
        p_id,
        starttime = min(gap.starttime),
        endtime = nxt.starttime
    from
        CTE_Gaps gap
        --Find the next starttime where its gap to the next > @gap
        cross apply (select starttime = MIN(sub.starttime) from CTE_Gaps sub where gap.p_id = sub.p_id and sub.starttime >= gap.starttime and sub.nextgap > @gap) nxt
group by P_ID, nxt.starttime
order by P_ID, nxt.starttime

答案 1 :(得分:0)

Jon最明确地向我们展示了正确的方向。虽然表现很糟糕(数据库中有400万+记录)。看起来我们错过了一些信息。通过我们从您那里学到的所有知识,我们提出了以下解决方案。它使用所有建议答案的元素,并在最终喷出结果之前循环通过3个临时表,但性能足够好,以及它生成的数据。

declare @gap int
declare @Employee_id int

set @gap = 30   
set dateformat dmy
--------------------------------------------------------------- #temp1 --------------------------------------------------
CREATE TABLE #temp1 ( EmployeeID int, starttime date)
INSERT INTO #temp1 ( EmployeeID, starttime)

select distinct ck.Employee_id, 
                cast(ck.starttime as date)
from SERVER1.DB1.dbo.checkins pd
        inner join SERVER1.DB1.dbo.Team t on ck.team_id = t.id
where t.productive = 1

--------------------------------------------------------------- #temp2 --------------------------------------------------

create table #temp2 (ROWNR int, Employeeid int, ENDOFCHECKIN datetime, FIRSTCHECKIN datetime)
INSERT INTO #temp2 

select Row_number() OVER (partition by EmployeeID ORDER BY t.prev) + 1 as ROWNR,
             EmployeeID,
             DATEADD(DAY, 1, t.Prev) AS start_gap,
           DATEADD(DAY, 0, t.next) AS end_gap
from 
             (
                    select a.EmployeeID,
                                  a.starttime as Prev, 
                                  (
                                  select min(b.starttime)
                                  from #temp1 as b
                                  where starttime > a.starttime and b.EmployeeID = a.EmployeeID 
                                  ) as Next
from #temp1 as a) as t

where  datediff(day, prev, next ) > 30
group by     EmployeeID,
                    t.Prev,
                    t.next
union -- add first known date for Employee 

select      1 as ROWNR,
            EmployeeID,
            NULL,
            min(starttime)
from #temp1 ct
group by ct.EmployeeID

--------------------------------------------------------------- #temp3 --------------------------------------------------

create table #temp3 (ROWNR int, Employeeid int, ENDOFCHECKIN datetime, STARTOFCHECKIN datetime)
INSERT INTO #temp3

select  ROWNR,
        Employeeid,
        ENDOFCHECKIN,
        FIRSTCHECKIN
from #temp2 

union -- add last known date for Employee 

select       (select count(*) from #temp2 b where Employeeid = ct.Employeeid)+1 as ROWNR,
             ct.Employeeid,
            (select dateadd(d,1,max(starttime)) from #temp1 c where Employeeid = ct.Employeeid),
             NULL
from #temp2 ct
group by ct.EmployeeID

---------------------------------------finally check our data-------------------------------------------------


select              a1.Employeeid,
                    a1.STARTOFCHECKIN as STARTOFCHECKIN,
                    ENDOFCHECKIN = CASE WHEN b1.ENDOFCHECKIN <= a1.STARTOFCHECKIN THEN a1.ENDOFCHECKIN ELSE b1.ENDOFCHECKIN END,
                    year(a1.STARTOFCHECKIN) as JaarSTARTOFCHECKIN,
                    JaarENDOFCHECKIN = CASE WHEN b1.ENDOFCHECKIN <= a1.STARTOFCHECKIN THEN  year(a1.ENDOFCHECKIN) ELSE  year(b1.ENDOFCHECKIN) END,
                    Month(a1.STARTOFCHECKIN) as MaandSTARTOFCHECKIN,
                    MaandENDOFCHECKIN = CASE WHEN b1.ENDOFCHECKIN <= a1.STARTOFCHECKIN THEN  month(a1.ENDOFCHECKIN) ELSE  month(b1.ENDOFCHECKIN) END,
                    (year(a1.STARTOFCHECKIN)*100)+month(a1.STARTOFCHECKIN) as JaarMaandSTARTOFCHECKIN,
                    JaarMaandENDOFCHECKIN = CASE WHEN b1.ENDOFCHECKIN <= a1.STARTOFCHECKIN THEN (year(a1.ENDOFCHECKIN)*100)+month(a1.STARTOFCHECKIN) ELSE (year(b1.ENDOFCHECKIN)*100)+month(b1.ENDOFCHECKIN) END,
                    datediff(M,a1.STARTOFCHECKIN,b1.ENDOFCHECKIN) as MONTHSCHECKEDIN
from #temp3 a1
       full outer join #temp3 b1 on a1.ROWNR = b1.ROWNR -1 and a1.Employeeid = b1.Employeeid
where not (a1.STARTOFCHECKIN is null AND b1.ENDOFCHECKIN is null) 
order by a1.Employeeid, a1.STARTOFCHECKIN