Question

我的客户有一个考勤系统，该系统以（大约）这种形式（换句话说，一天或半天）存储缺勤数据：

EmployeeID   AbsenceDate   AbsenceDays
1            2020-06-25    1
1            2020-06-24    1
1            2020-06-23    1
1            2020-06-22    1
1            2020-06-19    1
1            2020-06-18    1
1            2020-05-25    1
1            2020-06-23    1
1            2020-06-22    0.5

我建立了一个报告，按“原样”输出此数据，但是客户询问它是否可以采用这种形式（连续的相关天数汇总到一个范围内，总和为）：

EmployeeID   StartDate   EndDate       NoOfDays
1            2020-06-18  2020-06-25    6
1            2020-05-22  2020-06-25    2.5

我已经研究了差距与孤岛的解决方案，但是困难在于，对于这两种情况，都有一个中间的周末，在该周末中，不不应被计算在内。有什么方法可以使用标准SQL来执行此操作（而不是使用游标或其他ROBAR解决方案，出于明显的原因，我宁愿避免这样做）。

Answer 1

首先，可以使用经典的编程语言（而不是SQL）在客户端相对容易地进行这种分组。但是，如果您坚持...

我已经研究了差距与岛屿的解决方案，但困难是对于这两个而言，都有一个休假的周末数据不。

主要思想是为AbsenceDays的所有周末生成值为0的缺失行，这样，空白岛在周末时不会创建额外的范围。

我将为此使用日历表（具有所有日期列表和各种标志的表，例如IsWeekend）。

请注意，即使周末有一些缺勤日期，这种方法也将返回正确的结果。

样本数据

我已对您的样本数据进行了调整，以使其更加有趣和明确。（您的示例为相同的EmployeeID两次列出了相同的日期）

DECLARE @T TABLE (EmployeeID int, AbsenceDate date, AbsenceDays float);

INSERT INTO @T
VALUES
(2, '2020-06-25', 0.5),
(2, '2020-06-24', 0.5),
(2, '2020-06-23', 0.5),
(2, '2020-06-22', 0.5),
(2, '2020-06-19', 0.5),
(2, '2020-06-18', 0.5),
-- here we go across the weekend and both Sat and Sun are skipped

(1, '2020-06-25', 1),
(1, '2020-06-24', 1),
(1, '2020-06-23', 1),
(1, '2020-06-22', 1),
(1, '2020-06-19', 1),
(1, '2020-06-18', 1),
-- here we go across the weekend and both Sat and Sun are skipped

(1, '2020-05-25', 1),
(1, '2020-05-23', 1),
(1, '2020-05-22', 0.5);
-- here we go across the weekend and only Sun is skipped

查询

此查询使用Calendar表，其中dt用于所有日期，并带有标志IsWeekend。

CTE_Boundaries从日历中计算出每个员工需要的日期范围。 CTE_Weekends给我们每个星期六和星期日的行。最后，我们将源表和日历中的日期放在一起。

WITH
CTE_Boundaries
AS
(
    SELECT
        EmployeeID
        ,MIN(AbsenceDate) AS StartDate
        ,MAX(AbsenceDate) AS EndDate
    FROM
        @T AS T
    GROUP BY
        EmployeeID
)
,CTE_Weekends
AS
(
    SELECT
        CTE_Boundaries.EmployeeID
        ,Calendar.dt AS AbsenceDate
        ,0 AS AbsenceDays
    FROM
        CTE_Boundaries
        INNER JOIN Calendar
            ON  Calendar.dt >= CTE_Boundaries.StartDate
            AND Calendar.dt <= CTE_Boundaries.EndDate
    WHERE
        Calendar.IsWeekend = 1
)
,CTE_AllDates
AS
(
    SELECT
        EmployeeID
        ,AbsenceDate
        ,AbsenceDays
    FROM @T AS T

    UNION ALL

    SELECT
        EmployeeID
        ,AbsenceDate
        ,0 AS AbsenceDays
    FROM
        CTE_Weekends
)
SELECT
    EmployeeID
    ,AbsenceDate
    ,SUM(AbsenceDays) AS AbsenceDays
FROM CTE_AllDates
GROUP BY
    EmployeeID
    ,AbsenceDate
;

结果

+------------+-------------+-------------+
| EmployeeID | AbsenceDate | AbsenceDays |
+------------+-------------+-------------+
|          1 | 2020-05-22  |         0.5 |
|          1 | 2020-05-23  |           1 |
|          1 | 2020-05-24  |           0 |
|          1 | 2020-05-25  |           1 |
|          1 | 2020-05-30  |           0 |
|          1 | 2020-05-31  |           0 |
|          1 | 2020-06-06  |           0 |
|          1 | 2020-06-07  |           0 |
|          1 | 2020-06-13  |           0 |
|          1 | 2020-06-14  |           0 |
|          1 | 2020-06-18  |           1 |
|          1 | 2020-06-19  |           1 |
|          1 | 2020-06-20  |           0 |
|          1 | 2020-06-21  |           0 |
|          1 | 2020-06-22  |           1 |
|          1 | 2020-06-23  |           1 |
|          1 | 2020-06-24  |           1 |
|          1 | 2020-06-25  |           1 |
|          2 | 2020-06-18  |         0.5 |
|          2 | 2020-06-19  |         0.5 |
|          2 | 2020-06-20  |           0 |
|          2 | 2020-06-21  |           0 |
|          2 | 2020-06-22  |         0.5 |
|          2 | 2020-06-23  |         0.5 |
|          2 | 2020-06-24  |         0.5 |
|          2 | 2020-06-25  |         0.5 |
+------------+-------------+-------------+

现在，您可以对此数据集应用间隔和孤岛，并且将获得一组日期为2020-05-22 - 2020-05-25和2020-06-18 - 2020-06-25的日期。您还将获得每个周末的分组，但是对于那些孤独的周末，AbsenceDays的总和为零，因此我们可以将其过滤掉。

在这里我用ROW_NUMBER解决了空白与孤岛：

最终查询

WITH
CTE_Boundaries
AS
(
    SELECT
        EmployeeID
        ,MIN(AbsenceDate) AS StartDate
        ,MAX(AbsenceDate) AS EndDate
    FROM
        @T AS T
    GROUP BY
        EmployeeID
)
,CTE_Weekends
AS
(
    SELECT
        CTE_Boundaries.EmployeeID
        ,Calendar.dt AS AbsenceDate
        ,0 AS AbsenceDays
    FROM
        CTE_Boundaries
        INNER JOIN Calendar
            ON  Calendar.dt >= CTE_Boundaries.StartDate
            AND Calendar.dt <= CTE_Boundaries.EndDate
    WHERE
        Calendar.IsWeekend = 1
)
,CTE_AllDates
AS
(
    SELECT
        EmployeeID
        ,AbsenceDate
        ,AbsenceDays
    FROM @T AS T

    UNION ALL

    SELECT
        EmployeeID
        ,AbsenceDate
        ,0 AS AbsenceDays
    FROM
        CTE_Weekends
)
,CTE_Data
AS
(
    SELECT
        EmployeeID
        ,AbsenceDate
        ,SUM(AbsenceDays) AS AbsenceDays
    FROM CTE_AllDates
    GROUP BY
        EmployeeID
        ,AbsenceDate
)

-- apply gaps and islands to CTE_Data
,CTE_RowNumbers
AS
(
    SELECT
        EmployeeID
        ,AbsenceDate
        ,AbsenceDays
        ,ROW_NUMBER() OVER (PARTITION BY EmployeeID ORDER BY AbsenceDate) AS rn1
        ,DATEDIFF(day, '2020-01-01', AbsenceDate) AS rn2
    FROM
        CTE_Data
)
SELECT
    EmployeeID
    ,MIN(CASE WHEN AbsenceDays > 0 THEN AbsenceDate END) AS StartAbsenceDate
    ,MAX(CASE WHEN AbsenceDays > 0 THEN AbsenceDate END) AS EndAbsenceDate
    ,SUM(AbsenceDays) AS NoOfDays
FROM
    CTE_RowNumbers
GROUP BY
    EmployeeID
    ,rn2 - rn1
HAVING
    SUM(AbsenceDays) > 0
ORDER BY
    EmployeeID
    ,StartAbsenceDate
;

在范围的第一个或最后一个CASE WHEN AbsenceDays > 0 THEN AbsenceDate END是星期一或星期五的情况下，我们需要AbsenceDate。如果没有此检查，则周末的相邻两天可能会附加到最终范围之后。

结果

+------------+------------------+----------------+----------+
| EmployeeID | StartAbsenceDate | EndAbsenceDate | NoOfDays |
+------------+------------------+----------------+----------+
|          1 | 2020-05-22       | 2020-05-25     |      2.5 |
|          1 | 2020-06-18       | 2020-06-25     |        6 |
|          2 | 2020-06-18       | 2020-06-25     |        3 |
+------------+------------------+----------------+----------+

Answer 2

您的数据看起来不正确。每天有多行。我猜这是不允许的，这些人应该是不同的雇员。

要解决周末问题，可以使用lag()，累加和一些日期算术：

select EmployeeId, min(AbsenceDate), max(AbsenceDate), sum(AbsenceDays)
from (select t.*,
             sum(case when datename(weekday, AbsenceDate) in ('Tuesday', 'Wednesday', 'Thursday', 'Friday') and prev_ad = dateadd(day, -1, AbsenceDate)
                      then 0
                      when datename(weekday, AbsenceDate) in ('Monday') and prev_ad = dateadd(day, -3, AbsenceDate)
                      then 0
                      else 1
                 end) over (partition by EmployeeId order by AbsenceDate) as grp
      from (select t.*,
                   lag(AbsenceDate) over (partition by EmployeeId order by AbsenceDate) as prev_ad
            from t
           ) t
     ) t
group by EmployeeId, grp;

Here是db <>小提琴。根据样本数据，结果看起来正确，但是与您的问题不同。

有没有一种方法可以解决忽略周末的日期范围的问题？

2 个答案: