MSSQL:为每个组创建增量行标签

时间:2017-08-22 11:41:44

标签: sql-server

在我的表格中,我有一个主键和一个日期。我想要实现的是根据日期之间是否有中断来增加标签 - 列Goal

现在,下面是一个例子。 break列是使用LEAD函数计算的(我认为它可能有帮助)。

我可以使用T-SQL解决它,但这是最后的选择。到目前为止,我没有尝试过任何工作。我正在使用MSSQL 2014。

PK  |   Date  | break | Goal |
-------------------------------
1   | 03/2017 |   0   |  1   |
1   | 04/2017 |   0   |  1   |
1   | 08/2017 |   1   |  2   |
1   | 09/2017 |   0   |  2   |
1   | 10/2017 |   0   |  2   |
1   | 02/2018 |   1   |  3   |
1   | 03/2018 |   0   |  3   |

以下是重现此示例的代码:

CREATE TABLE #test 
    (
    ConsumerId INT,
    FullDate DATE,
    Goal INT
    )

INSERT INTO #test (ConsumerId, FullDate, Goal) VALUES (1,'2017-03-01',1)
INSERT INTO #test (ConsumerId, FullDate, Goal) VALUES (1,'2017-04-01',1)
INSERT INTO #test (ConsumerId, FullDate, Goal) VALUES (1,'2017-08-01',2)
INSERT INTO #test (ConsumerId, FullDate, Goal) VALUES (1,'2017-09-01',2)
INSERT INTO #test (ConsumerId, FullDate, Goal) VALUES (1,'2017-10-01',2)
INSERT INTO #test (ConsumerId, FullDate, Goal) VALUES (1,'2018-02-01',3)
INSERT INTO #test (ConsumerId, FullDate, Goal) VALUES (1,'2018-03-01',3)

SELECT      ConsumerId,
            FullDate,
            CASE WHEN (datediff(month,
                                isnull(
                                      LEAD (FullDate,1) OVER (PARTITION BY ConsumerId ORDER BY FullDate DESC),
                                      FullDate),
                                      FullDate) > 1) 
                 THEN 1 
                 ELSE 0 
            END AS break,
            Goal
FROM        #test
ORDER BY    FullDate ASC

修改

这显然是一个着名的问题“群岛和差距”,正如评论中所指出的那样。谷歌在SO提供了许多解决方案以及其他问题。

2 个答案:

答案 0 :(得分:2)

试试这个......

WITH 
    cte_TestGap AS (
        SELECT 
            t.ConsumerId, t.FullDate,
            Gap = CASE 
                        WHEN DATEDIFF(mm, t.FullDate, LAG(t.FullDate, 1) OVER (PARTITION BY t.ConsumerId ORDER BY t.FullDate)) = -1 
                        THEN 0 
                        ELSE ROW_NUMBER() OVER (PARTITION BY t.ConsumerId ORDER BY t.FullDate) 
                    END 
        FROM
            #test t
        ),
    cte_SmearGap AS (
        SELECT 
            tg.ConsumerId, tg.FullDate,
            GV = MAX(tg.Gap) OVER (PARTITION BY tg.ConsumerId ORDER BY tg.FullDate ROWS UNBOUNDED PRECEDING)
        FROM
            cte_TestGap tg
        )
SELECT 
    sg.ConsumerId, sg.FullDate,
    GroupValue = DENSE_RANK() OVER (PARTITION BY sg.ConsumerId ORDER BY sg.GV)
FROM
    cte_SmearGap sg;

对代码的解释及其工作方式...... 第一个查询在cte_TestGap中使用LAG函数和ROW_NUMBER()函数来标记数据中间隙的位置。我们可以通过打破它并查看它的结果来看到......

WITH 
    cte_TestGap AS (
        SELECT 
            t.ConsumerId, t.FullDate,
            Gap = CASE 
                        WHEN DATEDIFF(mm, t.FullDate, LAG(t.FullDate, 1) OVER (PARTITION BY t.ConsumerId ORDER BY t.FullDate)) = -1 
                        THEN 0 
                        ELSE ROW_NUMBER() OVER (PARTITION BY t.ConsumerId ORDER BY t.FullDate) 
                    END 
        FROM
            #test t
        )
    SELECT * FROM cte_TestGap;

cte_TestGap结果......

ConsumerId  FullDate   Gap
----------- ---------- --------------------
1           2017-03-01 1
1           2017-04-01 0
1           2017-08-01 3
1           2017-09-01 0
1           2017-10-01 0
1           2018-02-01 6
1           2018-03-01 0

此时我们希望0值取前面非0值的值,允许它们组合在一起。这是在第二个查询(cte_SmearGap)中使用带有“window frame”的MAX函数完成的。因此,如果我们查看cte_SmearGap的输出,我们可以看到......

WITH 
    cte_TestGap AS (
        SELECT 
            t.ConsumerId, t.FullDate,
            Gap = CASE 
                        WHEN DATEDIFF(mm, t.FullDate, LAG(t.FullDate, 1) OVER (PARTITION BY t.ConsumerId ORDER BY t.FullDate)) = -1 
                        THEN 0 
                        ELSE ROW_NUMBER() OVER (PARTITION BY t.ConsumerId ORDER BY t.FullDate) 
                    END 
        FROM
            #test t
        ),
    cte_SmearGap AS (
        SELECT 
            tg.ConsumerId, tg.FullDate,
            GV = MAX(tg.Gap) OVER (PARTITION BY tg.ConsumerId ORDER BY tg.FullDate ROWS UNBOUNDED PRECEDING)
        FROM
            cte_TestGap tg
        )
    SELECT * FROM cte_SmearGap;

cte_SmearGap结果......

ConsumerId  FullDate   GV
----------- ---------- --------------------
1           2017-03-01 1
1           2017-04-01 1
1           2017-08-01 3
1           2017-09-01 3
1           2017-10-01 3
1           2018-02-01 6
1           2018-03-01 6

此时所有的行都在不同的组中......但是......我们希望将我们的组号放在一个连续的序列(1,2,3)而不是(1,3,6) )。 当然,使用DENSE_Rank()函数很容易修复,这是最终选择中发生的...

WITH 
    cte_TestGap AS (
        SELECT 
            t.ConsumerId, t.FullDate,
            Gap = CASE 
                        WHEN DATEDIFF(mm, t.FullDate, LAG(t.FullDate, 1) OVER (PARTITION BY t.ConsumerId ORDER BY t.FullDate)) = -1 
                        THEN 0 
                        ELSE ROW_NUMBER() OVER (PARTITION BY t.ConsumerId ORDER BY t.FullDate) 
                    END 
        FROM
            #test t
        ),
    cte_SmearGap AS (
        SELECT 
            tg.ConsumerId, tg.FullDate,
            GV = MAX(tg.Gap) OVER (PARTITION BY tg.ConsumerId ORDER BY tg.FullDate ROWS UNBOUNDED PRECEDING)
        FROM
            cte_TestGap tg
        )
SELECT 
    sg.ConsumerId, sg.FullDate,
    GroupValue = DENSE_RANK() OVER (PARTITION BY sg.ConsumerId ORDER BY sg.GV)
FROM
    cte_SmearGap sg;

最终结果......

ConsumerId  FullDate   GroupValue
----------- ---------- --------------------
1           2017-03-01 1
1           2017-04-01 1
1           2017-08-01 2
1           2017-09-01 2
1           2017-10-01 2
1           2018-02-01 3
1           2018-03-01 3

答案 1 :(得分:1)

David Browne的评论实际上非常有用。如果你谷歌“群岛和差距”,解决方案有很多种。下面是我最喜欢的那个。

最后,我需要Goal列才能将日期分组为MIN / MAX。此解决方案跳过此步骤并直接创建聚合范围。

这是source

SELECT      MIN(FullDate) AS range_start,
            MAX(FUllDate) AS range_end
FROM        (
            SELECT      FullDate,
                        DATEADD(MM, -1 * ROW_NUMBER() OVER(ORDER BY FullDate), FullDate) AS grp
            FROM        #test
            ) a
GROUP BY    a.grp

输出:

range_start | range_end  |
--------------------------
2017-03-01  | 2017-04-01 |
2017-08-01  | 2017-10-01 |
2018-02-01  | 2018-03-01 |