有没有一种方法可以解决忽略周末的日期范围的问题?

时间:2020-06-25 10:02:15

标签: sql sql-server gaps-and-islands

我的客户有一个考勤系统,该系统以(大约)这种形式(换句话说,一天或半天)存储缺勤数据:

EmployeeID   AbsenceDate   AbsenceDays
1            2020-06-25    1
1            2020-06-24    1
1            2020-06-23    1
1            2020-06-22    1
1            2020-06-19    1
1            2020-06-18    1
1            2020-05-25    1
1            2020-06-23    1
1            2020-06-22    0.5

我建立了一个报告,按“原样”输出此数据,但是客户询问它是否可以采用这种形式(连续的相关天数汇总到一个范围内,总和为):

EmployeeID   StartDate   EndDate       NoOfDays
1            2020-06-18  2020-06-25    6
1            2020-05-22  2020-06-25    2.5

我已经研究了差距与孤岛的解决方案,但是困难在于,对于这两种情况,都有一个中间的周末,在该周末中,不应被计算在内。有什么方法可以使用标准SQL来执行此操作(而不是使用游标或其他ROBAR解决方案,出于明显的原因,我宁愿避免这样做)。

2 个答案:

答案 0 :(得分:1)

首先,可以使用经典的编程语言(而不是SQL)在客户端相对容易地进行这种分组。但是,如果您坚持...


我已经研究了差距与岛屿的解决方案,但困难是 对于这两个而言,都有一个休假的周末 数据

主要思想是为AbsenceDays的所有周末生成值为0的缺失行,这样,空白岛在周末时不会创建额外的范围。

我将为此使用日历表(具有所有日期列表和各种标志的表,例如IsWeekend)。

请注意,即使周末有一些缺勤日期,这种方法也将返回正确的结果。

样本数据

我已对您的样本数据进行了调整,以使其更加有趣和明确。 (您的示例为相同的EmployeeID两次列出了相同的日期)

DECLARE @T TABLE (EmployeeID int, AbsenceDate date, AbsenceDays float);

INSERT INTO @T
VALUES
(2, '2020-06-25', 0.5),
(2, '2020-06-24', 0.5),
(2, '2020-06-23', 0.5),
(2, '2020-06-22', 0.5),
(2, '2020-06-19', 0.5),
(2, '2020-06-18', 0.5),
-- here we go across the weekend and both Sat and Sun are skipped

(1, '2020-06-25', 1),
(1, '2020-06-24', 1),
(1, '2020-06-23', 1),
(1, '2020-06-22', 1),
(1, '2020-06-19', 1),
(1, '2020-06-18', 1),
-- here we go across the weekend and both Sat and Sun are skipped

(1, '2020-05-25', 1),
(1, '2020-05-23', 1),
(1, '2020-05-22', 0.5);
-- here we go across the weekend and only Sun is skipped

查询

此查询使用Calendar表,其中dt用于所有日期,并带有标志IsWeekend

CTE_Boundaries从日历中计算出每个员工需要的日期范围。 CTE_Weekends给我们每个星期六和星期日的行。最后,我们将源表和日历中的日期放在一起。

WITH
CTE_Boundaries
AS
(
    SELECT
        EmployeeID
        ,MIN(AbsenceDate) AS StartDate
        ,MAX(AbsenceDate) AS EndDate
    FROM
        @T AS T
    GROUP BY
        EmployeeID
)
,CTE_Weekends
AS
(
    SELECT
        CTE_Boundaries.EmployeeID
        ,Calendar.dt AS AbsenceDate
        ,0 AS AbsenceDays
    FROM
        CTE_Boundaries
        INNER JOIN Calendar
            ON  Calendar.dt >= CTE_Boundaries.StartDate
            AND Calendar.dt <= CTE_Boundaries.EndDate
    WHERE
        Calendar.IsWeekend = 1
)
,CTE_AllDates
AS
(
    SELECT
        EmployeeID
        ,AbsenceDate
        ,AbsenceDays
    FROM @T AS T

    UNION ALL

    SELECT
        EmployeeID
        ,AbsenceDate
        ,0 AS AbsenceDays
    FROM
        CTE_Weekends
)
SELECT
    EmployeeID
    ,AbsenceDate
    ,SUM(AbsenceDays) AS AbsenceDays
FROM CTE_AllDates
GROUP BY
    EmployeeID
    ,AbsenceDate
;

结果

+------------+-------------+-------------+
| EmployeeID | AbsenceDate | AbsenceDays |
+------------+-------------+-------------+
|          1 | 2020-05-22  |         0.5 |
|          1 | 2020-05-23  |           1 |
|          1 | 2020-05-24  |           0 |
|          1 | 2020-05-25  |           1 |
|          1 | 2020-05-30  |           0 |
|          1 | 2020-05-31  |           0 |
|          1 | 2020-06-06  |           0 |
|          1 | 2020-06-07  |           0 |
|          1 | 2020-06-13  |           0 |
|          1 | 2020-06-14  |           0 |
|          1 | 2020-06-18  |           1 |
|          1 | 2020-06-19  |           1 |
|          1 | 2020-06-20  |           0 |
|          1 | 2020-06-21  |           0 |
|          1 | 2020-06-22  |           1 |
|          1 | 2020-06-23  |           1 |
|          1 | 2020-06-24  |           1 |
|          1 | 2020-06-25  |           1 |
|          2 | 2020-06-18  |         0.5 |
|          2 | 2020-06-19  |         0.5 |
|          2 | 2020-06-20  |           0 |
|          2 | 2020-06-21  |           0 |
|          2 | 2020-06-22  |         0.5 |
|          2 | 2020-06-23  |         0.5 |
|          2 | 2020-06-24  |         0.5 |
|          2 | 2020-06-25  |         0.5 |
+------------+-------------+-------------+

现在,您可以对此数据集应用间隔和孤岛,并且将获得一组日期为2020-05-22 - 2020-05-252020-06-18 - 2020-06-25的日期。您还将获得每个周末的分组,但是对于那些孤独的周末,AbsenceDays的总和为零,因此我们可以将其过滤掉。

在这里我用ROW_NUMBER解决了空白与孤岛:

最终查询

WITH
CTE_Boundaries
AS
(
    SELECT
        EmployeeID
        ,MIN(AbsenceDate) AS StartDate
        ,MAX(AbsenceDate) AS EndDate
    FROM
        @T AS T
    GROUP BY
        EmployeeID
)
,CTE_Weekends
AS
(
    SELECT
        CTE_Boundaries.EmployeeID
        ,Calendar.dt AS AbsenceDate
        ,0 AS AbsenceDays
    FROM
        CTE_Boundaries
        INNER JOIN Calendar
            ON  Calendar.dt >= CTE_Boundaries.StartDate
            AND Calendar.dt <= CTE_Boundaries.EndDate
    WHERE
        Calendar.IsWeekend = 1
)
,CTE_AllDates
AS
(
    SELECT
        EmployeeID
        ,AbsenceDate
        ,AbsenceDays
    FROM @T AS T

    UNION ALL

    SELECT
        EmployeeID
        ,AbsenceDate
        ,0 AS AbsenceDays
    FROM
        CTE_Weekends
)
,CTE_Data
AS
(
    SELECT
        EmployeeID
        ,AbsenceDate
        ,SUM(AbsenceDays) AS AbsenceDays
    FROM CTE_AllDates
    GROUP BY
        EmployeeID
        ,AbsenceDate
)

-- apply gaps and islands to CTE_Data
,CTE_RowNumbers
AS
(
    SELECT
        EmployeeID
        ,AbsenceDate
        ,AbsenceDays
        ,ROW_NUMBER() OVER (PARTITION BY EmployeeID ORDER BY AbsenceDate) AS rn1
        ,DATEDIFF(day, '2020-01-01', AbsenceDate) AS rn2
    FROM
        CTE_Data
)
SELECT
    EmployeeID
    ,MIN(CASE WHEN AbsenceDays > 0 THEN AbsenceDate END) AS StartAbsenceDate
    ,MAX(CASE WHEN AbsenceDays > 0 THEN AbsenceDate END) AS EndAbsenceDate
    ,SUM(AbsenceDays) AS NoOfDays
FROM
    CTE_RowNumbers
GROUP BY
    EmployeeID
    ,rn2 - rn1
HAVING
    SUM(AbsenceDays) > 0
ORDER BY
    EmployeeID
    ,StartAbsenceDate
;

在范围的第一个或最后一个CASE WHEN AbsenceDays > 0 THEN AbsenceDate END是星期一或星期五的情况下,我们需要AbsenceDate。如果没有此检查,则周末的相邻两天可能会附加到最终范围之后。

结果

+------------+------------------+----------------+----------+
| EmployeeID | StartAbsenceDate | EndAbsenceDate | NoOfDays |
+------------+------------------+----------------+----------+
|          1 | 2020-05-22       | 2020-05-25     |      2.5 |
|          1 | 2020-06-18       | 2020-06-25     |        6 |
|          2 | 2020-06-18       | 2020-06-25     |        3 |
+------------+------------------+----------------+----------+

答案 1 :(得分:0)

您的数据看起来不正确。每天有多行。我猜这是不允许的,这些人应该是不同的雇员。

要解决周末问题,可以使用lag(),累加和一些日期算术:

select EmployeeId, min(AbsenceDate), max(AbsenceDate), sum(AbsenceDays)
from (select t.*,
             sum(case when datename(weekday, AbsenceDate) in ('Tuesday', 'Wednesday', 'Thursday', 'Friday') and prev_ad = dateadd(day, -1, AbsenceDate)
                      then 0
                      when datename(weekday, AbsenceDate) in ('Monday') and prev_ad = dateadd(day, -3, AbsenceDate)
                      then 0
                      else 1
                 end) over (partition by EmployeeId order by AbsenceDate) as grp
      from (select t.*,
                   lag(AbsenceDate) over (partition by EmployeeId order by AbsenceDate) as prev_ad
            from t
           ) t
     ) t
group by EmployeeId, grp;

Here是db <>小提琴。根据样本数据,结果看起来正确,但是与您的问题不同。