我有一个数据集,其中包含员工的身份,状态和日期范围 下面给出的输入数据集是一名员工的详细信息 记录中的日期范围是连续的(按照确切的顺序),使得第二行的开始日期将是第一行的结束日期的下一个日期。
如果员工连续休假不同月份,那么该表格会存储日期范围不同月份的信息。
例如:在输入集中,员工已经从16-10-2016''至' 31-12-2016'并加入' 1-1-2017'
所以此项目有3条记录,但日期是连续的。
在输出中,我需要将其作为一条记录,如预期的输出数据集中所示。
INPUT
Id Status StartDate EndDate
1 Active 1-9-2007 15-10-2016
1 Sick 16-10-2016 31-10-2016
1 Sick 1-11-2016 30-11-2016
1 Sick 1-12-2016 31-12-2016
1 Active 1-1-2017 4-2-2017
1 Unpaid 5-2-2017 9-2-2017
1 Active 10-2-2017 11-2-2017
1 Unpaid 12-2-2017 28-2-2017
1 Unpaid 1-3-2017 31-3-2017
1 Unpaid 1-4-2017 30-4-2017
1 Active 1-5-2017 13-10-2017
1 Sick 14-10-2017 11-11-2017
1 Active 12-11-2017 NULL
预期输出
Id Status StartDate EndDate
1 Active 1-9-2007 15-10-2016
1 Sick 16-10-2016 31-12-2016
1 Active 1-1-2017 4-2-2017
1 Unpaid 5-2-2017 9-2-2017
1 Active 10-2-2017 11-2-2017
1 Unpaid 12-2-2017 30-4-2017
1 Active 1-5-2017 13-10-2017
1 Sick 14-10-2017 11-11-2017
1 Active 12-11-2017 NULL
我不能通过id,status取min(startdate)和max(EndDate)组,因为如果同一个员工已经再次生病,那么结束日期(' 11-11-2017&#39 ;在示例中)将作为结束日期。
任何人都可以帮我解决SQL Server 2014中的查询问题吗?
答案 0 :(得分:3)
我突然发现这基本上是一个gaps and islands问题 - 所以我完全改变了我的解决方案 要使此解决方案起作用,日期不必是连续的。
首先,创建并填充样本表(请在将来的问题中保存此步骤):
DECLARE @T AS TABLE
(
Id int,
Status varchar(10),
StartDate date,
EndDate date
);
SET DATEFORMAT DMY; -- This is needed because how you specified your dates.
INSERT INTO @T (Id, Status, StartDate, EndDate) VALUES
(1, 'Active', '1-9-2007', '15-10-2016'),
(1, 'Sick', '16-10-2016', '31-10-2016'),
(1, 'Sick', '1-11-2016', '30-11-2016'),
(1, 'Sick', '1-12-2016', '31-12-2016'),
(1, 'Active', '1-1-2017', '4-2-2017'),
(1, 'Unpaid', '5-2-2017', '9-2-2017'),
(1, 'Active', '10-2-2017', '11-2-2017'),
(1, 'Unpaid', '12-2-2017', '28-2-2017'),
(1, 'Unpaid', '1-3-2017', '31-3-2017'),
(1, 'Unpaid', '1-4-2017', '30-4-2017'),
(1, 'Active', '1-5-2017', '13-10-2017'),
(1, 'Sick', '14-10-2017', '11-11-2017'),
(1, 'Active', '12-11-2017', NULL);
(新)公用表表达式:
;WITH CTE AS
(
SELECT Id,
Status,
StartDate,
EndDate,
ROW_NUMBER() OVER(PARTITION BY Id ORDER BY StartDate)
- ROW_NUMBER() OVER(PARTITION BY Id, Status ORDER BY StartDate) As IslandId,
ROW_NUMBER() OVER(PARTITION BY Id ORDER BY StartDate DESC)
- ROW_NUMBER() OVER(PARTITION BY Id, Status ORDER BY StartDate DESC) As ReverseIslandId
FROM @T
)
(新)查询:
SELECT DISTINCT Id,
Status,
MIN(StartDate) OVER(PARTITION BY IslandId, ReverseIslandId) As StartDate,
NULLIF(MAX(ISNULL(EndDate, '9999-12-31')) OVER(PARTITION BY IslandId, ReverseIslandId), '9999-12-31') As EndDate
FROM CTE
ORDER BY StartDate
(新)结果:
Id Status StartDate EndDate
1 Active 01.09.2007 15.10.2016
1 Sick 16.10.2016 31.12.2016
1 Active 01.01.2017 04.02.2017
1 Unpaid 05.02.2017 09.02.2017
1 Active 10.02.2017 11.02.2017
1 Unpaid 12.02.2017 30.04.2017
1 Active 01.05.2017 13.10.2017
1 Sick 14.10.2017 11.11.2017
1 Active 12.11.2017 NULL
You can see a live demo on rextester.
请注意,SQL中日期的字符串表示应符合ISO 8601 - 意味着yyyy-MM-dd
或yyyyMMdd
,因为它是明确的,并且将始终由SQL Server正确解释。
答案 1 :(得分:1)
这是一个没有使用LAG的替代答案。
首先,我需要复制一份测试数据:
DECLARE @table TABLE (Id INT, [Status] VARCHAR(50), StartDate DATE, EndDate DATE);
INSERT INTO @table SELECT 1, 'Active', '20070901', '20161015';
INSERT INTO @table SELECT 1, 'Sick', '20161016', '20161031';
INSERT INTO @table SELECT 1, 'Sick', '20161101', '20161130';
INSERT INTO @table SELECT 1, 'Sick', '20161201', '20161231';
INSERT INTO @table SELECT 1, 'Active', '20170101', '20170204';
INSERT INTO @table SELECT 1, 'Unpaid', '20170205', '20170209';
INSERT INTO @table SELECT 1, 'Active', '20170210', '20170211';
INSERT INTO @table SELECT 1, 'Unpaid', '20170212', '20170228';
INSERT INTO @table SELECT 1, 'Unpaid', '20170301', '20170331';
INSERT INTO @table SELECT 1, 'Unpaid', '20170401', '20170430';
INSERT INTO @table SELECT 1, 'Active', '20170501', '20171013';
INSERT INTO @table SELECT 1, 'Sick', '20171014', '20171111';
INSERT INTO @table SELECT 1, 'Active', '20171112', NULL;
然后查询是:
WITH add_order AS (
SELECT
*,
ROW_NUMBER() OVER (ORDER BY StartDate) AS order_id
FROM
@table),
links AS (
SELECT
a1.Id,
a1.[Status],
a1.order_id,
MIN(a1.order_id) AS start_order_id,
MAX(ISNULL(a2.order_id, a1.order_id)) AS end_order_id,
MIN(a1.StartDate) AS StartDate,
MAX(ISNULL(a2.EndDate, a1.EndDate)) AS EndDate
FROM
add_order a1
LEFT JOIN add_order a2 ON a2.Id = a1.Id AND a2.[Status] = a1.[Status] AND a2.order_id = a1.order_id + 1 AND a2.StartDate = DATEADD(DAY, 1, a1.EndDate)
GROUP BY
a1.Id,
a1.[Status],
a1.order_id),
merged AS (
SELECT
l1.Id,
l1.[Status],
l1.[StartDate],
ISNULL(l2.EndDate, l1.EndDate) AS EndDate,
ROW_NUMBER() OVER (PARTITION BY l1.Id, l1.[Status], ISNULL(l2.EndDate, l1.EndDate) ORDER BY l1.order_id) AS link_id
FROM
links l1
LEFT JOIN links l2 ON l2.order_id = l1.end_order_id)
SELECT
Id,
[Status],
StartDate,
EndDate
FROM
merged
WHERE
link_id = 1
ORDER BY
StartDate;
结果是:
Id Status StartDate EndDate
1 Active 2007-09-01 2016-10-15
1 Sick 2016-10-16 2016-12-31
1 Active 2017-01-01 2017-02-04
1 Unpaid 2017-02-05 2017-02-09
1 Active 2017-02-10 2017-02-11
1 Unpaid 2017-02-12 2017-04-30
1 Active 2017-05-01 2017-10-13
1 Sick 2017-10-14 2017-11-11
1 Active 2017-11-12 NULL
它是如何工作的?首先,我添加一个序列号,以帮助将连续的行合并在一起。然后我确定可以合并在一起的行,添加一个数字来标识每个可以合并的集合中的第一行,最后从最终的CTE中挑出第一行。请注意,我还必须处理无法合并的行,因此LEFT JOIN
和ISNULL
语句。
只是为了感兴趣,这是最终CTE的输出看起来的样子,然后我过滤除了link_id为1的所有行:
Id Status StartDate EndDate link_id
1 Active 2007-09-01 2016-10-15 1
1 Sick 2016-10-16 2016-12-31 1
1 Sick 2016-11-01 2016-12-31 2
1 Sick 2016-12-01 2016-12-31 3
1 Active 2017-01-01 2017-02-04 1
1 Unpaid 2017-02-05 2017-02-09 1
1 Active 2017-02-10 2017-02-11 1
1 Unpaid 2017-02-12 2017-04-30 1
1 Unpaid 2017-03-01 2017-04-30 2
1 Unpaid 2017-04-01 2017-04-30 3
1 Active 2017-05-01 2017-10-13 1
1 Sick 2017-10-14 2017-11-11 1
1 Active 2017-11-12 NULL 1
答案 2 :(得分:1)
这是GROUPING和WINDOW的一个例子。
searchController.dimsBackgroundDuringPresentation = false
Id | Status | StartDate | EndDate -: | :----- | :------------------ | :------------------ 1 | Active | 01/09/2007 00:00:00 | 15/10/2016 00:00:00 1 | Sick | 16/10/2016 00:00:00 | 31/12/2016 00:00:00 1 | Active | 01/01/2017 00:00:00 | 04/02/2017 00:00:00 1 | Unpaid | 05/02/2017 00:00:00 | 09/02/2017 00:00:00 1 | Active | 10/02/2017 00:00:00 | 11/02/2017 00:00:00 1 | Unpaid | 12/02/2017 00:00:00 | 30/04/2017 00:00:00 1 | Active | 01/05/2017 00:00:00 | 13/10/2017 00:00:00 1 | Sick | 14/10/2017 00:00:00 | 11/11/2017 00:00:00 1 | Active | 12/11/2017 00:00:00 | null
dbfiddle here
答案 3 :(得分:0)
您可以同时使用lag()
和lead()
功能来检查上一个和下一个状态
WITH CTE AS
(
select *,
COALESCE(LEAD(status) OVER(ORDER BY (select 1)), '0') Nstatus,
COALESCE(LAG(status) OVER(ORDER BY (select 1)), '0') Pstatus
from table
)
SELECT * FROM CTE
WHERE (status <> Nstatus AND status <> Pstatus) OR
(status <> Pstatus)