Question

我想根据下表确定连续缺席的次数。初步研究表明，我可以使用窗口功能实现这一目标。对于提供的数据，最长条纹是连续四次出现。请您告诉我如何将运行缺勤总数设置为单独的列。

create table events (eventdate date, absence int);

insert into events values ('2014-10-01', 0);
insert into events values ('2014-10-08', 1);
insert into events values ('2014-10-15', 1);
insert into events values ('2014-10-22', 0);
insert into events values ('2014-11-05', 0);
insert into events values ('2014-11-12', 1);
insert into events values ('2014-11-19', 1);
insert into events values ('2014-11-26', 1);
insert into events values ('2014-12-03', 1);
insert into events values ('2014-12-10', 0);

Answer 1

您没有指定您正在使用的RDBMS，但以下内容适用于postgresql的窗口函数，并且应该可以转换为类似的SQL引擎：

SELECT eventdate,
       absence,
       -- XXX We take advantage of the fact that absence is an int (1 or 0)
       --     otherwise we'd COUNT(1) OVER (...) and only conditionally
       --     display the count if absence = 1
       SUM(absence) OVER (PARTITION BY span ORDER BY eventdate)
         AS consecutive_absences
  FROM (SELECT spanstarts.*,
               SUM(newspan) OVER (ORDER BY eventdate) AS span
          FROM (SELECT events.*,
                CASE LAG(absence) OVER (ORDER BY eventdate)
                  WHEN absence THEN NULL
                  ELSE 1 END AS newspan
                  FROM events)
                spanstarts
        ) eventsspans
ORDER BY eventdate;

给你：

 eventdate  | absence | consecutive_absences 
------------+---------+----------------------
 2014-10-01 |       0 |                    0
 2014-10-08 |       1 |                    1
 2014-10-15 |       1 |                    2
 2014-10-22 |       0 |                    0
 2014-11-05 |       0 |                    0
 2014-11-12 |       1 |                    1
 2014-11-19 |       1 |                    2
 2014-11-26 |       1 |                    3
 2014-12-03 |       1 |                    4
 2014-12-10 |       0 |                    0

在pgsql-general mailing list上对上述方法进行了很好的剖析。缺点是：

最内层查询（spanstarts）使用LAG查找新的开头缺席的范围，无论是1的跨度还是跨度0＆＃39;
下一个查询（eventsspans）通过汇总我们面前的新跨度数来确定这些跨度。所以，我们找到span 1，然后是span 2，然后是3，等等。
外部查询计算每个范围内的缺勤数。

正如SQL评论所说，我们在＃3上做了一些利用它的数据类型，但净效果是一样的。

Answer 2

根据Gordon Linhoff的回答here，您可以这样做：

SELECT TOP 1
        MIN(eventdate) AS spanStart ,
        MAX(eventdate) AS spanEnd,
        COUNT(*) AS spanLength
FROM    ( SELECT    e.* ,
                    ( ROW_NUMBER() OVER ( ORDER BY eventdate )
                      - ROW_NUMBER() OVER ( PARTITION BY absence ORDER BY eventdate ) ) AS grp
          FROM      #events e
        ) t
GROUP BY grp ,
        absence
HAVING  absence = 1
ORDER BY COUNT(*) DESC;

返回：

spanStart   | spanEnd   | spanLength
---------------------------------------
2014-11-12  |2014-12-03 | 4

Answer 3

我不知道您的DBMS是什么，但这是来自SQLServer。希望它有一些帮助：）

-------------------------------------------------------------------------------------------
Query:

--tableRN is used to get the rownumber
;with tableRN as (SELECT a.*, ROW_NUMBER() OVER (ORDER BY a.event) as rn, COUNT(*) as maxRN
                 FROM absence a GROUP BY a.event, a.absence),

--cte is a recursive function that returns the...
--absence value, the level (amount of times 1 appeared in a row)
--rn (row number), total (total count
cte (absence, level, rn, total) AS (
SELECT 0, 0, 1, 0
UNION ALL 
SELECT r.absence, 
       CASE WHEN c.absence = 1 AND r.absence = 1 THEN level + 1
                                                 ELSE 0
       END, 
       c.rn + 1, 
       CASE WHEN c.level = 1 THEN total + 1
                             ELSE total
       END
FROM cte c JOIN tableRN r ON c.rn + 1 = r.rn)

--This gets you the total count of times there 
--was a consective absent (twice or more in a row).
SELECT MAX(c.total) AS Count FROM cte c

-------------------------------------------------------------------------------------------
Results:

|Count|
+-----+
|  2  |

Answer 4

创建一个名为consecutive_absence_count的新列，默认为0。

您可以为插入编写SQL过程 - 获取最新记录，检索缺席值，确定要插入的新记录是否具有存在或不存在的值。

如果他们最新且新记录的连续日期和缺席值设置为0，请将consecutive_absence_count增加，否则将其设置为0。

使用SQL确定连续值的总和

4 个答案: