使用基本组合之外的相关聚合拆分组?

时间:2017-07-18 16:35:34

标签: sql sql-server group-by

我不确定之前是否曾经问过这个问题,因为我自己也没有问过这个问题。我认为解释我的困境的最好方法是使用一个例子。

说我已经将自己的幸福评分为每天10到10级,我将结果放在一张大桌子上,其中我有一个日期对应于我的幸福评级的单个整数值。但是,我说,我只关心平均60天以上的幸福感(这可能看起来很奇怪,但这只是一个简化的例子)。所以我将这些信息汇总到一个表格中,我现在有一个开始日期字段,一个结束日期字段和一个平均评级字段,其中开始日期是从第一天到最后一天的所有10年的每一天,但结束日期恰好是60天后。需要说明的是,这60天的时间段是重叠的(一个会与下一个期间分享59天,58个与下一个期间分享,依此类推)。

接下来我选择一个阈值评级,比如5,我想把它下面的所有内容归类为" bad"类别和以上所有内容都变成了一个好的"类别。我可以轻松地添加另一个字段并使用案例结构来为每个60天的范围提供一个好的"或者"坏"旗。

然后总结一下,我想显示" good"并且"坏"从最大起始日期到最大结束日期。这就是我被困的地方。我可以通过好/坏类别进行分组,然后只需要min(开始日期)和max(结束日期),但是如果,例如,范围从好到坏再好到再到坏,输出将显示重叠范围好与坏。在上述情况下,我想展示四个不同的范围。

我意识到这对我来说似乎更清楚,如果你需要澄清,就会问别人。

谢谢

--- --- EDIT

这是以前的例子:

的StartDate |结束日期| MoodRating
------------ + ------------ + ------------
1/1/1991 | 3/1/1991 | 7
1/2/1991 | 3/2/1991 | 7
1/3/1991 | 3/3/1991 | 4
1/4/1991 | 3/4/1991 | 4
1/5/1991 | 3/5/1991 | 7
1/6/1991 | 3/6/1991 | 7
1/7/1991 | 3/7/1991 | 4
1/8/1991 | 3/8/1991 | 4
1/9/1991 | 3/9/1991 | 4

之后:

MinStart | MaxEnd |好/坏
----------- ------------ + ---------- +
1/1/1991 | 3/2/1991 |好
1/3/1991 | 3/4/1991 |坏
1/5/1991 | 3/6/1991 |好
1/7/1991 | 3/9/1991 |糟糕

目前,我按查询分组的查询将显示:

MinStart | MaxEnd |好/坏
----------- ------------ + ---------- +
1/1/1991 | 3/6/1991 |好
1/3/1991 | 3/9/1991 |糟糕

这是

的内容

选择min(StartDate),max(EndDate),Good_Bad
来自sourcetable
由Good_Bad组成的

2 个答案:

答案 0 :(得分:0)

这是你要找的吗?

IF OBJECT_ID('tempdb..#MyDailyMood', 'U') IS NOT NULL 
DROP TABLE #MyDailyMood;

CREATE TABLE #MyDailyMood (
    TheDate DATE NOT NULL,
    MoodLevel INT NOT NULL 
    );

WITH 
    cte_n1 (n) AS (SELECT 1 FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) n (n)), 
    cte_n2 (n) AS (SELECT 1 FROM cte_n1 a CROSS JOIN cte_n1 b),
    cte_n3 (n) AS (SELECT 1 FROM cte_n2 a CROSS JOIN cte_n2 b),
    cte_Calendar (dt) AS (
        SELECT TOP (DATEDIFF(dd, '2007-01-01', '2017-01-01'))
            DATEADD(dd, ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) - 1, '2007-01-01')
        FROM
            cte_n3 a CROSS JOIN cte_n3 b
        )
INSERT #MyDailyMood (TheDate, MoodLevel)  
SELECT 
    c.dt,
    ABS(CHECKSUM(NEWID()) % 10) + 1
FROM
    cte_Calendar c;

--==========================================================

WITH 
    cte_AddRN AS (
        SELECT 
            *,
            RN = ISNULL(NULLIF(ROW_NUMBER() OVER (ORDER BY mdm.TheDate) % 60, 0), 60)
        FROM
            #MyDailyMood mdm
        ),
    cte_AssignGroups AS (
        SELECT 
            *,
            DateGroup = DENSE_RANK() OVER (PARTITION BY arn.RN ORDER BY arn.TheDate)
        FROM
            cte_AddRN arn
        )
SELECT 
    BegOfRange = MIN(ag.TheDate),
    EndOfRange = MAX(ag.TheDate),
    AverageMoodLevel = AVG(ag.MoodLevel),
    CASE WHEN AVG(ag.MoodLevel) >= 5 THEN 'Good' ELSE 'Bad' END 
FROM
    cte_AssignGroups ag
GROUP BY 
    ag.DateGroup;

发布OP更新解决方案......

WITH 
    cte_AddRN AS (  -- Add a row number to each row that resets to 1 ever 60 rows.
        SELECT 
            *,
            RN = ISNULL(NULLIF(ROW_NUMBER() OVER (ORDER BY mdm.TheDate) % 60, 0), 60)
        FROM
            #MyDailyMood mdm
        ),
    cte_AssignGroups AS (   -- Use DENSE_RANK to create groups based on the RN added above.
                            -- How it works: RN set the row number 1 - 60 then repeats itself
                            -- but we dont want ever 60th row grouped together. We want blocks of 60 consecutive rows grouped together
                            -- DENSE_RANK accompolishes this by ranking within all the "1's", "2's"... and so on.
                            -- verify with the following query... SELECT * FROM cte_AssignGroups ag ORDER BY ag.TheDate
        SELECT 
            *,
            DateGroup = DENSE_RANK() OVER (PARTITION BY arn.RN ORDER BY arn.TheDate)
        FROM
            cte_AddRN arn
        ),
    cte_AggRange AS (   -- This is just a straight forward aggregation/rollup. It produces the results similar to the sample data you posed in your edit.
        SELECT 
            BegOfRange = MIN(ag.TheDate),
            EndOfRange = MAX(ag.TheDate),
            AverageMoodLevel = AVG(ag.MoodLevel),
            GorB = CASE WHEN AVG(ag.MoodLevel) >= 5 THEN 'Good' ELSE 'Bad' END,
            ag.DateGroup
        FROM
            cte_AssignGroups ag
        GROUP BY 
            ag.DateGroup
        ),
    cte_CompactGroup AS (   -- This time we're using dense rank to group all of the consecutive "Good" and "Bad" values so that they can be further aggregated below.
        SELECT 
            ar.BegOfRange, ar.EndOfRange, ar.AverageMoodLevel, ar.GorB, ar.DateGroup,
            DenseGroup = ar.DateGroup - DENSE_RANK() OVER (PARTITION BY ar.GorB ORDER BY ar.BegOfRange)
        FROM
            cte_AggRange ar
        )
-- The final aggregation step...
SELECT 
    BegOfRange = MIN(cg.BegOfRange),
    EndOfRange = MAX(cg.EndOfRange),
    cg.GorB
FROM
    cte_CompactGroup cg
GROUP BY 
    cg.DenseGroup,
    cg.GorB
ORDER BY 
    BegOfRange;

答案 1 :(得分:0)

虽然Jason A Long的回答可能是正确的 - 我无法阅读或弄明白,所以我想我会发布自己的答案。假设这不是一个你将不断运行的过程,CURSOR的性能打击应该不重要。但是(至少对我而言)这个解决方案非常易读且易于修改。

简而言之 - 我们将源表中的第一条记录插入到结果表中。接下来,我们抓住下一条记录,看看情绪评分是否与之前的记录相同。如果是,我们只需使用当前记录的结束日期(扩展范围)更新上一个记录的结束日期。如果没有,我们会插入一条新记录。冲洗,重复。简单。

以下是您的设置和一些示例数据:

DECLARE @MoodRanges TABLE (StartDate DATE, EndDate DATE, MoodRating int)

INSERT INTO @MoodRanges
VALUES
('1/1/1991','3/1/1991', 7),
('1/2/1991','3/2/1991', 7),
('1/3/1991','3/3/1991', 4),
('1/4/1991','3/4/1991', 4),
('1/5/1991','3/5/1991', 7),
('1/6/1991','3/6/1991', 7),
('1/7/1991','3/7/1991', 4),
('1/8/1991','3/8/1991', 4),
('1/9/1991','3/9/1991', 4)

接下来,我们可以创建一个表来存储我们的结果,以及我们光标的一些变量占位符:

DECLARE @MoodResults TABLE(ID INT IDENTITY(1, 1), StartDate DATE, EndDate DATE, MoodScore varchar(50))
DECLARE @CurrentStartDate DATE, @CurrentEndDate DATE, @CurrentMoodScore INT, 
        @PreviousStartDate DATE, @PreviousEndDate DATE, @PreviousMoodScore INT

现在我们将所有样本数据放入CURSOR:

DECLARE MoodCursor CURSOR FOR
SELECT StartDate, EndDate, MoodRating
FROM @MoodRanges

OPEN MoodCursor
FETCH NEXT FROM MoodCursor INTO @CurrentStartDate, @CurrentEndDate, @CurrentMoodScore

WHILE @@FETCH_STATUS = 0
    BEGIN

    IF @PreviousStartDate IS NOT NULL 
        BEGIN

        IF (@PreviousMoodScore >= 5 AND @CurrentMoodScore >= 5)
        OR  (@PreviousMoodScore < 5 AND @CurrentMoodScore < 5)
            BEGIN
                UPDATE @MoodResults
                SET EndDate = @CurrentEndDate
                WHERE ID = (SELECT MAX(ID) FROM @MoodResults)
            END
        ELSE
            BEGIN
                INSERT INTO 
                @MoodResults
                VALUES
                (@CurrentStartDate, @CurrentEndDate, CASE WHEN @CurrentMoodScore >= 5 THEN 'GOOD' ELSE 'BAD' END)
            END
        END
    ELSE
        BEGIN
            INSERT INTO 
            @MoodResults
            VALUES
            (@CurrentStartDate, @CurrentEndDate, CASE WHEN @CurrentMoodScore >= 5 THEN 'GOOD' ELSE 'BAD' END)
        END


    SET @PreviousStartDate = @CurrentStartDate
    SET @PreviousEndDate = @CurrentEndDate
    SET @PreviousMoodScore = @CurrentMoodScore

    FETCH NEXT FROM MoodCursor INTO @CurrentStartDate, @CurrentEndDate, @CurrentMoodScore
    END

CLOSE MoodCursor
DEALLOCATE MoodCursor

以下是结果:

SELECT * FROM @MoodResults

ID          StartDate  EndDate    MoodScore
----------- ---------- ---------- --------------------------------------------------
1           1991-01-01 1991-03-02 GOOD
2           1991-01-03 1991-03-04 BAD
3           1991-01-05 1991-03-06 GOOD
4           1991-01-07 1991-03-09 BAD