如何正确地为我的数据池使用Row_Number()(分区)

时间:2011-06-08 10:45:09

标签: sql-server row-number

我们有以下表格(输出已经订购并分开以便理解):

| PK | FK1 | FK2 |   ActionCode |         CreationTS  | SomeAttributeValue |
+----+-----+-----+--------------+---------------------+--------------------+
|  6 | 100 | 500 |       Create | 2011-01-02 00:00:00 |                  H |
----------------------------------------------------------------------------
|  3 | 100 | 500 |       Change | 2011-01-01 02:00:00 |                  Z |
|  2 | 100 | 500 |       Change | 2011-01-01 01:00:00 |                  X |
|  1 | 100 | 500 |       Create | 2011-01-01 00:00:00 |                  Y |
----------------------------------------------------------------------------
|  4 | 100 | 510 |       Create | 2011-01-01 00:30:00 |                  T |
----------------------------------------------------------------------------
|  5 | 100 | 520 | CreateSystem | 2011-01-01 00:30:00 |                  A |
----------------------------------------------------------------------------

什么是ActionCode?我们在c#中使用它,它代表一个枚举值

我想要实现什么?

好吧,我需要以下输出:

| FK1 | FK2 |   ActionCode | SomeAttributeValue |
+-----+-----+--------------+--------------------+
| 100 | 500 |       Create |                  H |
| 100 | 500 |       Create |                  Z |
| 100 | 510 |       Create |                  T |
| 100 | 520 | CreateSystem |                  A |
-------------------------------------------------
嗯,实际的逻辑是什么? 我们有一些复合键的逻辑组(FK1 + FK2)。这些组中的每一个都可以分为多个分区,分区以CreateCreateSystem开头。每个分区以CreateCreateSystemChange结尾。每个分区的SomeAttributeValue的实际值应该是分区最后一行的值。

无法使用以下数据池:

| PK | FK1 | FK2 |   ActionCode |         CreationTS  | SomeAttributeValue |
+----+-----+-----+--------------+---------------------+--------------------+
|  7 | 100 | 500 |       Change | 2011-01-02 02:00:00 |                  Z |
|  6 | 100 | 500 |       Create | 2011-01-02 00:00:00 |                  H |
|  2 | 100 | 500 |       Change | 2011-01-01 01:00:00 |                  X |
|  1 | 100 | 500 |       Create | 2011-01-01 00:00:00 |                  Y |
----------------------------------------------------------------------------

然后期望PK 7影响PK 2或PK 6以影响PK 1。

我甚至不知道如何/从哪里开始...我怎样才能实现这一目标? 我们正在运行mssql 2005 +

修改
a dump可用:

  • instanceId:我的PK
  • tenantId:FK 1
  • campaignId:FK 2
  • callId:FK 3
  • refillCounter:FK 4
  • ticketType:ActionCode(1& 4& 6为Create,5为Change,3必须忽略)
  • ticketType,profileId,contactPersonId,ownerId,handlingStartTime,handlingEndTime,memo,callWasPreselected,creatorId,creationTS,changerId,changeTS应取自Create(分组中的第一行)
  • callingState,reasonId,followUpDate,callingAttempts和callingAttemptsConsecutivelyNotReached应​​取自最后一个Create(然后将是“一个分区在组中”/与上一个相同)或{ {1}}(分组中的最后一行)

2 个答案:

答案 0 :(得分:2)

我假设每个分区只能包含单个 Create或CreateSystem,否则您的要求是不明确的。以下是未经测试的,因为我没有样本表,也没有采用易于使用的格式的样本数据:

;With Partitions as (
     Select
         t1.FK1,
         t1.FK2,
         t1.CreationTS as StartTS,
         t2.CreationTS as EndTS
     From
         Table t1
             left join
         Table t2
             on
                  t1.FK1 = t2.FK1 and
                  t1.FK2 = t2.FK2 and
                  t1.CreationTS < t2.CreationTS and
                  t2.ActionCode in ('Create','CreateSystem')
             left join
         Table t3
             on
                  t1.FK1 = t3.FK1 and
                  t1.FK2 = t3.FK2 and
                  t1.CreationTS < t3.CreationTS and
                  t3.CreationTS < t2.CreationTS and
                  t3.ActionCode in ('Create','CreateSystem')
       where
           t1.ActionCode in ('Create','CreateSystem') and
           t3.FK1 is null
), PartitionRows as (
     SELECT
         t1.FK1,
         t1.FK2,
         t1.ActionCode,
         t2.SomeAttributeValue,
         ROW_NUMBER() OVER (PARTITION_FRAGMENT_ID BY t1.FK1,T1.FK2,t1.StartTS ORDER BY t2.CreationTS desc) as rn
     from
         Partitions t1
             inner join
         Table t2
             on
                t1.FK1 = t2.FK1 and
                t1.FK2 = t2.FK2 and
                t1.StartTS <= t2.CreationTS and
                (t2.CreationTS < t1.EndTS or t1.EndTS is null)
)
select * from PartitionRows where rn = 1

(请注意,我在这里使用各种保留名称)

基本逻辑是:分区CTE用于根据FK1,FK2,包含性开始时间戳和独占结束时间戳定义每个分区。它通过对基表的三重连接来实现。选择来自t2的行来自t1之后的行,然后选择来自t3的行来自t1和{{1}的匹配行之间}。然后,在WHERE子句中,我们从t2中排除匹配结果集中的所有行 - 结果是来自t3的行和来自t1的行代表了开始两个相邻的分区。

第二个CTE然后从t2为每个分区检索所有行,但在每个分区内根据Table分配ROW_NUMBER()分数,按降序排序,结果为{每个分区中的{1}} 1是要发生的最后一行。

最后,在select中,我们选择在各自分区中最后出现的那些行。

这都假设CreationTS值在每个分区中是不同的。如果这个假设没有成功,我也可以使用PK重新开始工作。

答案 1 :(得分:0)

可通过递归CTE解决。这里(假设分区中的行按CreationTS排序):

WITH partitioned AS (
  SELECT
    *,
    rn = ROW_NUMBER() OVER (PARTITION BY FK1, FK2 ORDER BY CreationTS)
  FROM data
),
subgroups AS (
  SELECT
    PK, FK1, FK2, ActionCode, CreationTS, SomeAttributeValue, rn,
    Subgroup = 1,
    Subrank  = 1
  FROM partitioned
  WHERE rn = 1
  UNION ALL
  SELECT
    p.PK, p.FK1, p.FK2, p.ActionCode, p.CreationTS, p.SomeAttributeValue, p.rn,
    Subgroup = s.Subgroup + CASE p.ActionCode WHEN 'Change' THEN 0 ELSE 1 END,
    Subrank  = CASE p.ActionCode WHEN 'Change' THEN s.Subrank ELSE 0 END + 1
  FROM partitioned p
    INNER JOIN subgroups s ON p.FK1 = s.FK1 AND p.FK2 = s.FK2
      AND p.rn = s.rn + 1
),
finalranks AS (
  SELECT
    PK, FK1, FK2, ActionCode, CreationTS, SomeAttributeValue, rn,
    Subgroup, Subrank,
    rank = ROW_NUMBER() OVER (PARTITION BY FK1, FK2, Subgroup ORDER BY Subrank DESC)
    /* or: rank = MAX(Subrank) OVER (PARTITION BY FK1, FK2, Subgroup) - Subrank + 1 */
  FROM subgroups
)
SELECT PK, FK1, FK2, ActionCode, CreationTS, SomeAttributeValue
FROM finalranks
WHERE rank = 1