我想根据下表确定连续缺席的次数。初步研究表明,我可以使用窗口功能实现这一目标。对于提供的数据,最长条纹是连续四次出现。请您告诉我如何将运行缺勤总数设置为单独的列。
create table events (eventdate date, absence int);
insert into events values ('2014-10-01', 0);
insert into events values ('2014-10-08', 1);
insert into events values ('2014-10-15', 1);
insert into events values ('2014-10-22', 0);
insert into events values ('2014-11-05', 0);
insert into events values ('2014-11-12', 1);
insert into events values ('2014-11-19', 1);
insert into events values ('2014-11-26', 1);
insert into events values ('2014-12-03', 1);
insert into events values ('2014-12-10', 0);
答案 0 :(得分:1)
您没有指定您正在使用的RDBMS,但以下内容适用于postgresql的窗口函数,并且应该可以转换为类似的SQL引擎:
SELECT eventdate,
absence,
-- XXX We take advantage of the fact that absence is an int (1 or 0)
-- otherwise we'd COUNT(1) OVER (...) and only conditionally
-- display the count if absence = 1
SUM(absence) OVER (PARTITION BY span ORDER BY eventdate)
AS consecutive_absences
FROM (SELECT spanstarts.*,
SUM(newspan) OVER (ORDER BY eventdate) AS span
FROM (SELECT events.*,
CASE LAG(absence) OVER (ORDER BY eventdate)
WHEN absence THEN NULL
ELSE 1 END AS newspan
FROM events)
spanstarts
) eventsspans
ORDER BY eventdate;
给你:
eventdate | absence | consecutive_absences
------------+---------+----------------------
2014-10-01 | 0 | 0
2014-10-08 | 1 | 1
2014-10-15 | 1 | 2
2014-10-22 | 0 | 0
2014-11-05 | 0 | 0
2014-11-12 | 1 | 1
2014-11-19 | 1 | 2
2014-11-26 | 1 | 3
2014-12-03 | 1 | 4
2014-12-10 | 0 | 0
在pgsql-general mailing list上对上述方法进行了很好的剖析。缺点是:
spanstarts
)使用LAG查找新的开头
缺席的范围,无论是1的跨度还是跨度0' eventsspans
)通过汇总我们面前的新跨度数来确定这些跨度。所以,我们找到span 1,然后是span 2,然后是3,等等。正如SQL评论所说,我们在#3上做了一些利用它的数据类型,但净效果是一样的。
答案 1 :(得分:1)
根据Gordon Linhoff的回答here,您可以这样做:
SELECT TOP 1
MIN(eventdate) AS spanStart ,
MAX(eventdate) AS spanEnd,
COUNT(*) AS spanLength
FROM ( SELECT e.* ,
( ROW_NUMBER() OVER ( ORDER BY eventdate )
- ROW_NUMBER() OVER ( PARTITION BY absence ORDER BY eventdate ) ) AS grp
FROM #events e
) t
GROUP BY grp ,
absence
HAVING absence = 1
ORDER BY COUNT(*) DESC;
返回:
spanStart | spanEnd | spanLength
---------------------------------------
2014-11-12 |2014-12-03 | 4
答案 2 :(得分:0)
我不知道您的DBMS是什么,但这是来自SQLServer。希望它有一些帮助:)
-------------------------------------------------------------------------------------------
Query:
--tableRN is used to get the rownumber
;with tableRN as (SELECT a.*, ROW_NUMBER() OVER (ORDER BY a.event) as rn, COUNT(*) as maxRN
FROM absence a GROUP BY a.event, a.absence),
--cte is a recursive function that returns the...
--absence value, the level (amount of times 1 appeared in a row)
--rn (row number), total (total count
cte (absence, level, rn, total) AS (
SELECT 0, 0, 1, 0
UNION ALL
SELECT r.absence,
CASE WHEN c.absence = 1 AND r.absence = 1 THEN level + 1
ELSE 0
END,
c.rn + 1,
CASE WHEN c.level = 1 THEN total + 1
ELSE total
END
FROM cte c JOIN tableRN r ON c.rn + 1 = r.rn)
--This gets you the total count of times there
--was a consective absent (twice or more in a row).
SELECT MAX(c.total) AS Count FROM cte c
-------------------------------------------------------------------------------------------
Results:
|Count|
+-----+
| 2 |
答案 3 :(得分:-1)
创建一个名为consecutive_absence_count
的新列,默认为0。
您可以为插入编写SQL过程 - 获取最新记录,检索缺席值,确定要插入的新记录是否具有存在或不存在的值。
如果他们最新且新记录的连续日期和缺席值设置为0
,请将consecutive_absence_count
增加,否则将其设置为0。