我的客户有一个考勤系统,该系统以(大约)这种形式(换句话说,一天或半天)存储缺勤数据:
EmployeeID AbsenceDate AbsenceDays
1 2020-06-25 1
1 2020-06-24 1
1 2020-06-23 1
1 2020-06-22 1
1 2020-06-19 1
1 2020-06-18 1
1 2020-05-25 1
1 2020-06-23 1
1 2020-06-22 0.5
我建立了一个报告,按“原样”输出此数据,但是客户询问它是否可以采用这种形式(连续的相关天数汇总到一个范围内,总和为):
EmployeeID StartDate EndDate NoOfDays
1 2020-06-18 2020-06-25 6
1 2020-05-22 2020-06-25 2.5
我已经研究了差距与孤岛的解决方案,但是困难在于,对于这两种情况,都有一个中间的周末,在该周末中,不不应被计算在内。有什么方法可以使用标准SQL来执行此操作(而不是使用游标或其他ROBAR解决方案,出于明显的原因,我宁愿避免这样做)。
答案 0 :(得分:1)
首先,可以使用经典的编程语言(而不是SQL)在客户端相对容易地进行这种分组。但是,如果您坚持...
我已经研究了差距与岛屿的解决方案,但困难是 对于这两个而言,都有一个休假的周末 数据不。
主要思想是为AbsenceDays
的所有周末生成值为0的缺失行,这样,空白岛在周末时不会创建额外的范围。
我将为此使用日历表(具有所有日期列表和各种标志的表,例如IsWeekend
)。
请注意,即使周末有一些缺勤日期,这种方法也将返回正确的结果。
样本数据
我已对您的样本数据进行了调整,以使其更加有趣和明确。 (您的示例为相同的EmployeeID
两次列出了相同的日期)
DECLARE @T TABLE (EmployeeID int, AbsenceDate date, AbsenceDays float);
INSERT INTO @T
VALUES
(2, '2020-06-25', 0.5),
(2, '2020-06-24', 0.5),
(2, '2020-06-23', 0.5),
(2, '2020-06-22', 0.5),
(2, '2020-06-19', 0.5),
(2, '2020-06-18', 0.5),
-- here we go across the weekend and both Sat and Sun are skipped
(1, '2020-06-25', 1),
(1, '2020-06-24', 1),
(1, '2020-06-23', 1),
(1, '2020-06-22', 1),
(1, '2020-06-19', 1),
(1, '2020-06-18', 1),
-- here we go across the weekend and both Sat and Sun are skipped
(1, '2020-05-25', 1),
(1, '2020-05-23', 1),
(1, '2020-05-22', 0.5);
-- here we go across the weekend and only Sun is skipped
查询
此查询使用Calendar
表,其中dt
用于所有日期,并带有标志IsWeekend
。
CTE_Boundaries
从日历中计算出每个员工需要的日期范围。 CTE_Weekends
给我们每个星期六和星期日的行。最后,我们将源表和日历中的日期放在一起。
WITH
CTE_Boundaries
AS
(
SELECT
EmployeeID
,MIN(AbsenceDate) AS StartDate
,MAX(AbsenceDate) AS EndDate
FROM
@T AS T
GROUP BY
EmployeeID
)
,CTE_Weekends
AS
(
SELECT
CTE_Boundaries.EmployeeID
,Calendar.dt AS AbsenceDate
,0 AS AbsenceDays
FROM
CTE_Boundaries
INNER JOIN Calendar
ON Calendar.dt >= CTE_Boundaries.StartDate
AND Calendar.dt <= CTE_Boundaries.EndDate
WHERE
Calendar.IsWeekend = 1
)
,CTE_AllDates
AS
(
SELECT
EmployeeID
,AbsenceDate
,AbsenceDays
FROM @T AS T
UNION ALL
SELECT
EmployeeID
,AbsenceDate
,0 AS AbsenceDays
FROM
CTE_Weekends
)
SELECT
EmployeeID
,AbsenceDate
,SUM(AbsenceDays) AS AbsenceDays
FROM CTE_AllDates
GROUP BY
EmployeeID
,AbsenceDate
;
结果
+------------+-------------+-------------+
| EmployeeID | AbsenceDate | AbsenceDays |
+------------+-------------+-------------+
| 1 | 2020-05-22 | 0.5 |
| 1 | 2020-05-23 | 1 |
| 1 | 2020-05-24 | 0 |
| 1 | 2020-05-25 | 1 |
| 1 | 2020-05-30 | 0 |
| 1 | 2020-05-31 | 0 |
| 1 | 2020-06-06 | 0 |
| 1 | 2020-06-07 | 0 |
| 1 | 2020-06-13 | 0 |
| 1 | 2020-06-14 | 0 |
| 1 | 2020-06-18 | 1 |
| 1 | 2020-06-19 | 1 |
| 1 | 2020-06-20 | 0 |
| 1 | 2020-06-21 | 0 |
| 1 | 2020-06-22 | 1 |
| 1 | 2020-06-23 | 1 |
| 1 | 2020-06-24 | 1 |
| 1 | 2020-06-25 | 1 |
| 2 | 2020-06-18 | 0.5 |
| 2 | 2020-06-19 | 0.5 |
| 2 | 2020-06-20 | 0 |
| 2 | 2020-06-21 | 0 |
| 2 | 2020-06-22 | 0.5 |
| 2 | 2020-06-23 | 0.5 |
| 2 | 2020-06-24 | 0.5 |
| 2 | 2020-06-25 | 0.5 |
+------------+-------------+-------------+
现在,您可以对此数据集应用间隔和孤岛,并且将获得一组日期为2020-05-22 - 2020-05-25
和2020-06-18 - 2020-06-25
的日期。您还将获得每个周末的分组,但是对于那些孤独的周末,AbsenceDays
的总和为零,因此我们可以将其过滤掉。
在这里我用ROW_NUMBER
解决了空白与孤岛:
最终查询
WITH
CTE_Boundaries
AS
(
SELECT
EmployeeID
,MIN(AbsenceDate) AS StartDate
,MAX(AbsenceDate) AS EndDate
FROM
@T AS T
GROUP BY
EmployeeID
)
,CTE_Weekends
AS
(
SELECT
CTE_Boundaries.EmployeeID
,Calendar.dt AS AbsenceDate
,0 AS AbsenceDays
FROM
CTE_Boundaries
INNER JOIN Calendar
ON Calendar.dt >= CTE_Boundaries.StartDate
AND Calendar.dt <= CTE_Boundaries.EndDate
WHERE
Calendar.IsWeekend = 1
)
,CTE_AllDates
AS
(
SELECT
EmployeeID
,AbsenceDate
,AbsenceDays
FROM @T AS T
UNION ALL
SELECT
EmployeeID
,AbsenceDate
,0 AS AbsenceDays
FROM
CTE_Weekends
)
,CTE_Data
AS
(
SELECT
EmployeeID
,AbsenceDate
,SUM(AbsenceDays) AS AbsenceDays
FROM CTE_AllDates
GROUP BY
EmployeeID
,AbsenceDate
)
-- apply gaps and islands to CTE_Data
,CTE_RowNumbers
AS
(
SELECT
EmployeeID
,AbsenceDate
,AbsenceDays
,ROW_NUMBER() OVER (PARTITION BY EmployeeID ORDER BY AbsenceDate) AS rn1
,DATEDIFF(day, '2020-01-01', AbsenceDate) AS rn2
FROM
CTE_Data
)
SELECT
EmployeeID
,MIN(CASE WHEN AbsenceDays > 0 THEN AbsenceDate END) AS StartAbsenceDate
,MAX(CASE WHEN AbsenceDays > 0 THEN AbsenceDate END) AS EndAbsenceDate
,SUM(AbsenceDays) AS NoOfDays
FROM
CTE_RowNumbers
GROUP BY
EmployeeID
,rn2 - rn1
HAVING
SUM(AbsenceDays) > 0
ORDER BY
EmployeeID
,StartAbsenceDate
;
在范围的第一个或最后一个CASE WHEN AbsenceDays > 0 THEN AbsenceDate END
是星期一或星期五的情况下,我们需要AbsenceDate
。如果没有此检查,则周末的相邻两天可能会附加到最终范围之后。
结果
+------------+------------------+----------------+----------+
| EmployeeID | StartAbsenceDate | EndAbsenceDate | NoOfDays |
+------------+------------------+----------------+----------+
| 1 | 2020-05-22 | 2020-05-25 | 2.5 |
| 1 | 2020-06-18 | 2020-06-25 | 6 |
| 2 | 2020-06-18 | 2020-06-25 | 3 |
+------------+------------------+----------------+----------+
答案 1 :(得分:0)
您的数据看起来不正确。每天有多行。我猜这是不允许的,这些人应该是不同的雇员。
要解决周末问题,可以使用lag()
,累加和一些日期算术:
select EmployeeId, min(AbsenceDate), max(AbsenceDate), sum(AbsenceDays)
from (select t.*,
sum(case when datename(weekday, AbsenceDate) in ('Tuesday', 'Wednesday', 'Thursday', 'Friday') and prev_ad = dateadd(day, -1, AbsenceDate)
then 0
when datename(weekday, AbsenceDate) in ('Monday') and prev_ad = dateadd(day, -3, AbsenceDate)
then 0
else 1
end) over (partition by EmployeeId order by AbsenceDate) as grp
from (select t.*,
lag(AbsenceDate) over (partition by EmployeeId order by AbsenceDate) as prev_ad
from t
) t
) t
group by EmployeeId, grp;
Here是db <>小提琴。根据样本数据,结果看起来正确,但是与您的问题不同。