复杂的海峡和群岛'问题

时间:2016-08-16 16:12:37

标签: sql postgresql window-functions gaps-and-islands

我在Postgres DB中有一个表格,如下所示:

person    |     eventdate     |  type 
--------------------------------------
<uuid-1>  |   2016-05-14      |   300
<uuid-3>  |   2016-05-14      |   300
<uuid-1>  |   2016-05-15      |   301
<uuid-1>  |   2016-05-16      |   301
<uuid-1>  |   2016-05-18      |   304
<uuid-1>  |   2016-05-22      |   300
<uuid-2>  |   2016-05-22      |   304
<uuid-2>  |   2016-05-27      |   301
<uuid-1>  |   2016-05-30      |   300
<uuid-1>  |   2016-06-01      |   300
<uuid-2>  |   2016-06-15      |   501
<uuid-2>  |   2016-06-16      |   301
<uuid-4>  |   2016-06-16      |   300
<uuid-5>  |   2016-06-20      |   300
<uuid-1>  |   2016-06-21      |   300
<uuid-2>  |   2016-06-21      |   300
<uuid-2>  |   2016-06-23      |   301
<uuid-2>  |   2016-06-30      |   300
<uuid-3>  |   2016-06-30      |   300
<uuid-4>  |   2016-06-30      |   300

该表包含因缺勤原因(类型)而员工缺勤的每一天的非连续日期条目。 但是,缺席期可能会持续数天,并且在之前缺席相同类型的5天内缺席仍然被认为是相同缺席期间的一部分&#39;。

我需要获得每个员工缺勤期的产出,包括这些期间的开始和结束日期,以及该多日期跨期的总天数。

由于本报告的目的认为不同类型的缺席是相同的,因此情况更加复杂。因此,在上面的示例中,类型300,301,304将被视为相同。

所以从我上面的例子中可以看出以下内容......

person    |     startdate     |       enddate     |  days   |  type 
--------------------------------------------------------------------
<uuid-1>  |     2016-05-14    |     2016-05-22    |   5     |   300
<uuid-3>  |     2016-05-14    |     2016-04-14    |   1     |   300
<uuid-2>  |     2016-05-22    |     2016-04-27    |   2     |   304
<uuid-1>  |     2016-05-30    |     2016-06-01    |   2     |   300
<uuid-2>  |     2016-06-15    |     2016-06-15    |   1     |   501
<uuid-2>  |     2016-06-16    |     2016-06-16    |   1     |   301
<uuid-4>  |     2016-06-16    |     2016-06-16    |   1     |   300
<uuid-5>  |     2016-06-20    |     2016-06-20    |   1     |   300
<uuid-1>  |     2016-06-21    |     2016-06-21    |   1     |   300
<uuid-2>  |     2016-06-21    |     2016-06-23    |   2     |   300
<uuid-2>  |     2016-06-30    |     2016-06-30    |   1     |   300
<uuid-3>  |     2016-06-30    |     2016-06-30    |   1     |   300
<uuid-4>  |     2016-06-30    |     2016-06-30    |   1     |   300

如何在此表中查询此输出?

1 个答案:

答案 0 :(得分:2)

目前还不清楚如何确定每个时期的type。我选择了最小数字。

假设这个基本的表定义:

CREATE TABLE tbl (person text, eventdate date, type int);

基本上,我建议在两个嵌套子查询中window functions来识别同一时期(岛)的成员。聚合:

SELECT person, period
     , min(eventdate) AS startdate
     , max(eventdate) AS enddate
     , count(*)       AS days
     , min(type)      AS type
FROM  (
   SELECT person, eventdate, type
        , count(gap) OVER (PARTITION BY person ORDER BY eventdate) AS period
   FROM  (
      SELECT person, eventdate, type
           , CASE WHEN lag(eventdate) OVER (PARTITION BY person ORDER BY eventdate)
                     > eventdate - 6  -- within 5 days
                  THEN NULL           -- same period
                  ELSE TRUE           -- next period
             END AS gap
      FROM   tbl
      ) sub
   ) sub
GROUP  BY person, period
ORDER  BY person, period;

结果(基于您的示例数据):

  person  | period | startdate  |  enddate   | days | type
----------+--------+------------+------------+------+------
 <uuid-1> |      1 | 2016-05-14 | 2016-05-22 |    5 |  300
 <uuid-1> |      2 | 2016-05-30 | 2016-06-01 |    2 |  300
 <uuid-1> |      3 | 2016-06-21 | 2016-06-21 |    1 |  300
 <uuid-2> |      1 | 2016-05-22 | 2016-05-27 |    2 |  301
 <uuid-2> |      2 | 2016-06-15 | 2016-06-23 |    4 |  300
 <uuid-2> |      3 | 2016-06-30 | 2016-06-30 |    1 |  300
 <uuid-3> |      1 | 2016-05-14 | 2016-05-14 |    1 |  300
 <uuid-3> |      2 | 2016-06-30 | 2016-06-30 |    1 |  300
 <uuid-4> |      1 | 2016-06-16 | 2016-06-16 |    1 |  300
 <uuid-4> |      2 | 2016-06-30 | 2016-06-30 |    1 |  300
 <uuid-5> |      1 | 2016-06-20 | 2016-06-20 |    1 |  300

如果同一个人的同一天可以使用不同的类型多次输入,并且您只想计算不同的天,请将其设为:count(DISTINCT eventdate) AS days

相关,详细说明:

BTW,eventdate - 6适用于数据类型date,但不适用于timestamp