进行MULTIPLE子组聚合

时间:2015-04-24 16:14:04

标签: sql-server sql-server-2008

我正在尝试使用我们数据中现有记录挖掘的聚合创建子记录。我有一个列出记录的表,与另一个记录的记录,列出了对每个记录的操作。 记录表如下所示:

Key  OpenDate  LastUpdate
aa   1/1/2015  1/14/2015
bb   1/3/2015  1/15/2015

动作表如下所示:

    Key  Date     Action
    aa  1/1/2015    Working
    aa  1/4/2015    Escalated
    aa  1/5/2015    Done
    aa  1/6/2015    Working
    aa  1/7/2015    Done
    aa  1/13/2015   Done
    aa  1/14/2015   Working
    bb  1/3/2015    Working
    bb  1/4/2015    Working
    bb  1/5/2015    Escalated
    bb  1/6/2015    Working
    bb  1/7/2015    Done
    bb  1/13/2015   Working
    bb  1/15/2015   Done  

我希望能够在每次完成记录时创建一行'它记录了该周期的开始和结束,并计算该范围内的一些项目:

Key  SubID  DateBegin   DateEnd   #Actions #Escalations
aa   1     1/1/2015    1/5/2015    3       1
aa   2     1/6/2015    1/7/2015    2       0
aa   3     1/13/2015   1/13/2015   1       0
aa   4     1/14/2015   null        1       0
bb   1     1/3/2015    1/7/2015    5       1
bb   2     1/13/2015   1/15/2015   2       0

基本上,逻辑是子记录在Action值='Done'时结束,并且新的子记录在任何后续动作(以及第一个动作)上开始。

我看到的解决方案仅适用于一个记录的数据here,但不止一个会给我带来问题。 我正在使用SQL Server 2008.

更新 - 我有多条记录返回,但日期数据似乎不正确 - 不确定它是否得到了它应该:

SELECT Key, Cycles.CYCLE_BEGIN_DATE, Cycles.CYCLE_END_DATE, Cycles.NUM_ACTIONS_IN_CYCLE
FROM Records
FULL OUTER JOIN
    (select e.Key, min(Date) as CYCLE_BEGIN_DATE,
          max(case when Action = 'Done') then Date end) as CYCLE_END_DATE,
          count(*) as NUM_ACTIONS_IN_CYCLE
    from (select Key, Action, rowID = ROW_NUMBER() OVER (PARTITION BY Key ORDER BY Date asc), Date
     from Actions
   ) e
   outer apply
    (select count(*) as grp
        from (SELECT Key, rowID = ROW_NUMBER() OVER (PARTITION BY Reason_Key ORDER BY Date asc), Date, Action
    FROM Actions
   ) e2
   where e2.Date < e.Date and e2.Action = 'Done' and e.Reason_Key = e2.Reason_Key
   ) e2
   group by e.Reason_Key, e2.grp
) CYCLES
on Records.Key = Cycles.Key

1 个答案:

答案 0 :(得分:1)

我认为与前面的问题基本相同。您希望在任何给定的完成记录之前严格计算完成记录的数量。这为您提供了一个组标识符,然后可以用于聚合。

在SQL Server 2012+中,您将使用累积和功能。在早期版本中,您可以使用相关子查询或外部应用来执行相同的操作。

此版本以多种方式修改您的上述内容。特别是,它简化了定义grp的逻辑。我不容易看出row_number()如何适合查询。我理解逻辑 - 枚举已完成的操作并将其用于聚合。但是,在组中的所有行上获取该值并非易事。

SELECT r.Key, a.CYCLE_BEGIN_DATE, a.CYCLE_END_DATE, a.NUM_ACTIONS_IN_CYCLE
FROM Records r LEFT OUTER JOIN
     (select a.key, a2.grp, min(Date) as CYCLE_BEGIN_DATE,
             max(case when Action = 'Done') then Date end) as CYCLE_END_DATE,
             count(*) as NUM_ACTIONS_IN_CYCLE
      from actions a outer apply
           (select count(*) as grp
            from actions a2
            where a2.key = a.key and a2.date < a.date and a2.action = 'Done'
           ) a2
     group by a.key, a2.grp
    ) a
    on r.key = a.key;