如何用daterange计算最长的连胜纪录

时间:2014-06-07 14:34:35

标签: sql sql-server-2008

我在sql server中有一个表,它包含像

这样的数据
userID  amount  startdate              enddate        game    Result
-------------------------------------------------------------------------------
68838   51.00   2014-05-29 15:41:41.167 2014-05-29 15:41:41.167 1   W
68838   51.00   2014-05-29 15:42:30.757 2014-05-29 15:42:30.757 1   W
68838   -0.50   2014-05-31 16:57:31.033 2014-05-31 16:57:31.033 1   L
68838   -0.50   2014-05-31 17:05:31.023 2014-05-31 17:05:31.023 1   L
68838   -0.50   2014-05-31 17:22:03.857 2014-05-31 17:22:03.857 1   L
68838   0.42    2014-05-31 17:26:32.570 2014-05-31 17:26:32.570 1   W
68838   0.42    2014-05-31 17:34:45.330 2014-05-31 17:34:45.330 1   W
68838   0.42    2014-05-31 17:38:44.107 2014-05-31 17:38:44.107 1   W
68838   0.42    2014-05-31 17:42:12.790 2014-05-31 17:42:12.790 1   W
434278  0.42    2014-05-31 16:57:31.033 2014-05-31 16:57:31.033 1   W
434278  0.42    2014-05-31 17:05:31.023 2014-05-31 17:05:31.023 1   W
434278  0.42    2014-05-31 17:22:03.857 2014-05-31 17:22:03.857 1   W
434278  -0.50   2014-05-31 17:26:32.570 2014-05-31 17:26:32.570 1   L
434278  -0.50   2014-05-31 17:34:45.330 2014-05-31 17:34:45.330 1   L
434278  -0.50   2014-05-31 17:38:44.107 2014-05-31 17:38:44.107 1   L
434278  -0.50   2014-05-31 17:42:12.790 2014-05-31 17:42:12.790 1   L
434278  0.42    2014-05-31 17:46:40.723 2014-05-31 17:46:40.723 1   W
434278  -0.50   2014-05-31 17:51:26.190 2014-05-31 17:51:26.190 1   L
434278  0.42    2014-05-31 17:55:32.870 2014-05-31 17:55:32.870 1   W
434278  -4.00   2014-05-31 18:06:54.937 2014-05-31 18:06:54.937 1   L
434278  -2.00   2014-05-31 18:19:29.483 2014-05-31 18:19:29.483 1   L

我希望结果看起来像这样,显示每个用户的最长连胜

UserId StartDate                  Enddate                    Streak  amount
--------------------------------------------------------------------
68838  2014-05-31 17:26:32:570    2014-05-31 17:42:12:570     4       1.68
434278  2014-05-31 16:57:31:033   2014-05-31 17:22:03:857     3       1.26

3 个答案:

答案 0 :(得分:2)

免责声明:格伦的答案是一个很好的答案,并为你做了很多繁重的工作,但它并没有完全按照你的要求去做。我打算发一个我一直在努力的答案,但是当我来添加它时,看到Glenn做重物的方式比我做的方式更好,所以我重新设计了我的答案,包括他的方式正在做。我会敦促你接受他的答案,而不是我的答案。

以下内容应完全符合您的要求。

SELECT
    Userid,
    Min_StartDate as StartDate,
    Max_EndDate as EndDate,
    max_group_count as Streak,
    sum_Amount as Amount

FROM (
    SELECT
        *,
        -- we want the earliest maximum streak
        max(Min_StartDate) OVER (PARTITION BY userid) as Earliest_StartDate

    FROM (
        SELECT
            *,
            -- we want the maximum streak
            max(max_group_count) OVER (PARTITION BY userid) as MAX_Streak
        FROM (    
            SELECT DISTINCT
                  userid,
                  -- Calculate this streak
                  COUNT(grouping) OVER (PARTITION BY userid, grouping
                             ORDER BY startdate
                             ROWS BETWEEN UNBOUNDED PRECEDING
                                  AND UNBOUNDED FOLLOWING ) as max_group_count
                  -- Calcualte the start date of this streak
                  ,MIN(StartDate) OVER (PARTITION BY userid, grouping
                             ORDER BY startdate
                             ROWS BETWEEN UNBOUNDED PRECEDING
                                  AND UNBOUNDED FOLLOWING ) as Min_StartDate
                  -- Calcualte the end date of this streak
                  ,MAX(EndDate) OVER (PARTITION BY userid, grouping
                             ORDER BY startdate
                             ROWS BETWEEN UNBOUNDED PRECEDING
                                  AND UNBOUNDED FOLLOWING ) as Max_EndDate
                  -- Calcualte the total amount
                  ,SUM(Amount) OVER (PARTITION BY userid, grouping
                             ORDER BY startdate
                             ROWS BETWEEN UNBOUNDED PRECEDING
                                  AND UNBOUNDED FOLLOWING ) as Sum_Amount         

              FROM ( SELECT *
                        -- Assign a group number to the streak, so we can group by it
                       ,SUM(CASE WHEN result <> prev_result THEN 1 ELSE 0 END) OVER
                       (PARTITION BY userid ORDER BY startdate) AS grouping

                   FROM ( SELECT *
                         -- We want to look at the previous record to determin when the 
                         -- winning/loosing streak starts and ends
                        ,COALESCE(LAG(result) OVER
                            (PARTITION BY userid ORDER BY startdate), result) AS prev_result
                        FROM game
                    ) a

                   WHERE result = 'W'

                   ) b
             ) c
      ) d 
  WHERE
      Max_Group_Count = Max_Streak

) e
WHERE
  Min_StartDate = Earliest_StartDate

这个输出是:

| USERID |                  STARTDATE |                    ENDDATE | STREAK | AMOUNT |
|--------|----------------------------|----------------------------|--------|--------|
|  68838 | May, 31 2014 17:26:32+0000 | May, 31 2014 17:42:12+0000 |      4 |   1.68 |
| 434278 | May, 31 2014 16:57:31+0000 | May, 31 2014 17:22:03+0000 |      3 |   1.26 |

如果您喜欢,我已将此作为您可以玩的sql小提琴:http://sqlfiddle.com/#!6/32777/36/0

答案 1 :(得分:0)

以下是一些让您入门的建议。您可以使用内部查询并开始解决问题。基本上,它首先在显示先前结果的每一行中添加一个额外的列。然后,每次前一个结果与当前结果不同时,请将其视为组切换。每个分组都有自己的编号(基于0,并且在用户标识的上下文中)。丢掉&#39; L&#39;分组。现在,您对每个用户的最大计数分组感兴趣。

SELECT *
      ,COUNT(grouping) OVER (PARTITION BY userid, grouping
                 ORDER BY startdate
                 ROWS BETWEEN UNBOUNDED PRECEDING
                      AND UNBOUNDED FOLLOWING ) as max_group_count

  FROM ( SELECT *
           ,SUM(CASE WHEN result <> prev_result THEN 1 ELSE 0 END) OVER
           (PARTITION BY userid ORDER BY startdate) AS grouping

       FROM ( SELECT *
            ,COALESCE(LAG(result) OVER
                (PARTITION BY userid ORDER BY startdate), result) AS prev_result
            FROM game
        ) x

       WHERE result = 'W'

       ) y

结果:

 userid |        startdate        |         enddate         | result | prev_result | grouping | max_group_count
--------+-------------------------+-------------------------+--------+-------------+----------+-----------------
  68838 | 2014-05-29 15:41:41.167 | 2014-05-29 15:41:41.167 | W      | W           |        0 |               2
  68838 | 2014-05-29 15:42:30.757 | 2014-05-29 15:42:30.757 | W      | W           |        0 |               2
  68838 | 2014-05-31 17:26:32.57  | 2014-05-31 17:26:32.57  | W      | L           |        1 |               4
  68838 | 2014-05-31 17:34:45.33  | 2014-05-31 17:34:45.33  | W      | W           |        1 |               4
  68838 | 2014-05-31 17:38:44.107 | 2014-05-31 17:38:44.107 | W      | W           |        1 |               4
  68838 | 2014-05-31 17:42:12.79  | 2014-05-31 17:42:12.79  | W      | W           |        1 |               4
 434278 | 2014-05-31 16:57:31.033 | 2014-05-31 16:57:31.033 | W      | W           |        0 |               3
 434278 | 2014-05-31 17:05:31.023 | 2014-05-31 17:05:31.023 | W      | W           |        0 |               3
 434278 | 2014-05-31 17:22:03.857 | 2014-05-31 17:22:03.857 | W      | W           |        0 |               3
 434278 | 2014-05-31 17:46:40.723 | 2014-05-31 17:46:40.723 | W      | L           |        1 |               1
 434278 | 2014-05-31 17:55:32.87  | 2014-05-31 17:55:32.87  | W      | L           |        2 |               1
(11 rows)

答案 2 :(得分:0)

在SQLServer 2008中,SUM无法使用OVER(ORDER BY ...),这会使查询变得更复杂,但并非不可能

;WITH myID AS (
  SELECT userID, amount, startdate, enddate, game, Result
       , ID = Row_Number() OVER (Partition By userID ORDER BY startdate)
  FROM   Table1 t1
), SR AS (
  SELECT t1.userID, t1.startdate, t1.enddate, t1.Result, t1.amount
       , SUM(CASE WHEN t1.Result <> COALESCE(t2.Result, t1.Result) 
                  THEN 1 
                  ELSE 0 END) SC
  FROM   myID t1
         LEFT JOIN myID t2 ON t1.userID = t2.userID AND t1.ID >= t2.ID
  GROUP BY t1.userID, t1.startdate, t1.enddate, t1.Result, t1.amount
), SL AS (
  SELECT userID, Result, SC, Count(1) Streak
       , Row_Number() OVER (PARTITION BY userID ORDER BY Count(1) DESC) Pos
  FROM   SR
  WHERE  Result = 'W'
  GROUP BY userID, Result, SC
)
SELECT p.userID
     , MIN(p.startdate) startdate
     , MAX(p.enddate) enddate
     , l.Streak
     , SUM(p.Amount) Amount
FROM   SR p
       INNER JOIN SL l ON p.userID = l.userID AND p.SC = l.SC
WHERE  l.Pos = 1
GROUP BY p.userID, l.Streak

SQLFiddle demo

myID CTE生成数据的整数ID,以简化下一个JOIN的{​​{1}}条件,如果问题中显示的数据不是整个表格,并且已经存在一个具有相同效果的列,应删除此CTE CTE(StreakRank)SR将数据添加到Streak计数器CTE,使用三角形SC为每条条纹生成排名,它不是密集排名,它是只是用来分组的东西。
JOIN(StreakLength)SL获得每个连胜的连胜长度,并按长度创建排名。
主查询将所有内容放在一起:从CTE usedIDSL获取JOIN的最长条纹以获取详细信息。