识别表中的特定记录序列

时间:2009-12-03 20:21:22

标签: c# sql sql-server sql-server-2005

假设一个表格包含TransactionId,ItemId,Code,EffectiveDate和CreateDate字段。

+---------------+--------+------+------------------+------------------+
| TransactionId | ItemId | Code |   EffectiveDate  |     CreateDate   |
+---------------+--------+------+------------------+------------------+
|              1|       1|     8| 12/2/2009 1:13 PM| 12/2/2009 1:13 PM|
+---------------+--------+------+------------------+------------------+
|              4|       1|    51|12/2/2009 11:08 AM| 12/3/2009 9:01 AM|
+---------------+--------+------+------------------+------------------+
|              2|       1|    14|12/2/2009 11:09 AM|12/2/2009 11:09 AM|
+---------------+--------+------+------------------+------------------+
|              3|       1|    61| 12/3/2009 8:33 AM| 12/3/2009 8:33 AM|
+---------------+--------+------+------------------+------------------+
|              5|       1|    28| 12/3/2009 9:33 AM| 12/3/2009 9:33 AM|
+---------------+--------+------+------------------+------------------+
|              6|       1|     9| 12/3/2009 1:58 PM| 12/3/2009 1:58 PM|
+---------------+--------+------+------------------+------------------+

我需要得到一组记录,其中序列51,61,9对于给定的ItemId发生,按EffectiveDate排序。在这些记录之间可能还有其他记录和其他代码。

在这种情况下,我会返回TransactionId的4,3和6,如下所示。

+---------------+--------+------+------------------+------------------+
| TransactionId | ItemId | Code |   EffectiveDate  |     CreateDate   |
+---------------+--------+------+------------------+------------------+
|              4|       1|    51|12/2/2009 11:08 AM| 12/3/2009 9:01 AM|
+---------------+--------+------+------------------+------------------+
|              3|       1|    61| 12/3/2009 8:33 AM| 12/3/2009 8:33 AM|
+---------------+--------+------+------------------+------------------+
|              6|       1|     9| 12/3/2009 1:58 PM| 12/3/2009 1:58 PM|
+---------------+--------+------+------------------+------------------+

请注意:

  • 这不是我需要识别的唯一序列,但它说明了问题。
  • 记录可以不按顺序插入表中;也就是说,首先可以插入61记录,然后是51,然后是9.您可以在示例中看到这一点,其中代码51记录CreateDate晚于EffectiveDate。
  • 序列的顺序很重要。因此,序列61,9,51不会返回任何记录,但是51,61,9会。

如果简单(即没有游标或过于复杂的存储过程),DB方法是理想的,但代码方法也可以工作,尽管它会导致大量数据传输出DB。

环境是SQL Server 2005和C#/ .NET 3.5。

3 个答案:

答案 0 :(得分:1)

  

如果数据库方法很简单(即没有游标或过于复杂的存储过程),那么这种方法是理想的。

我不相信纯DB方法(“纯”意味着仅使用SQL SELECT)是实用的,因为我设想的SQL类型需要非常复杂的自连接,字段连接,MAX()函数等。在Joe Celko的“SQL for Smarties”一书中,SQL的类型可能是一个有趣的学术答案,但我认为这不适合生产代码。

我认为现实的方法是编写一种跟踪状态的循环。一般意义上的问题非常类似于编写用于状态检查TCPIP数据包的代码以进行垃圾邮件过滤或扫描欺诈模式的信用卡交易。所有这些问题都具有相似的特征:您对当前行(记录)所采取的操作取决于您之前看到的记录(上下文)......并且该方面需要保存状态变量。

如果要避免将数据往返进行分析,看起来Transact-SQL是性能的最佳方式。或者使用托管CLR来利用C#语法,同时仍然将处理保留在数据库引擎中。

答案 1 :(得分:1)

实际上,您可以利用ranking/windowing functions和/或CTEsrecursive CTEs获得一些相当简单的解决方案。

创建一个过程,接受基于字符的逗号分隔的代码值列表,这些代码值在您希望的序列中查找 - 使用由序列和代码值组成的dozen possible ways to split this list into a table/set中的任何一个,得到一个具有如下结构的表:

declare @sequence table (sequence int not null, Code int not null);

一旦你有了这个,只需要根据将序列表连接到源表上对给定ItemId的相同代码值对源集进行排序 - 一旦你对源集进行了过滤和排序,你就可以简单地基于匹配的序列值再次连接 - 这听起来很复杂,但实际上它将是这样的单个查询:

with srcData as (
    select  row_number() over(order by t.EffectiveDate) as rn,
            t.TransactionId, t.ItemId, t.Code, t.EffectiveDate, t.CreateDate
    from    #TableName t
    join    @sequence s
    on      t.Code = s.Code
    where   t.ItemId = @item_id
)
select  d.TransactionId, d.ItemId, d.Code, d.EffectiveDate, d.CreateDate
from    srcData d
join    @sequence s
on      d.rn = s.sequence
and     d.Code = s.Code
order by d.rn;

仅凭这一点并不能保证您获得的结果集与您要查找的结果集相同,但是将数据暂存到临时表中并在代码周围添加一些简单的检查就可以了(例如) ,添加校验和验证和代码值之和)

declare @tempData table (rn int, TransactionId smallint, ItemId smallint, Code smallint, EffectiveDate datetime, CreateDate datetime);

with srcData as (
    select  row_number() over(order by t.EffectiveDate) as rn,
            t.TransactionId, t.ItemId, t.Code, t.EffectiveDate, t.CreateDate
    from    #TableName t
    join    @sequence s
    on      t.Code = s.Code
    where   t.ItemId = @item_id
)
insert  @tempData
        (rn, TransactionId, ItemId, Code, EffectiveDate, CreateDate)
select  d.rn, d.TransactionId, d.ItemId, d.Code, d.EffectiveDate, d.CreateDate
from    srcData d
join    @sequence s
on      d.rn = s.sequence
and     d.Code = s.Code;

-- Verify we have matching hash/sums    
if
(
    ( (select sum(Code) from @sequence) = (select sum(Code) from @tempData) )
    and
    ( (select checksum_agg(checksum(sequence, Code)) from @sequence) = (select checksum_agg(checksum(rn, Code)) from @tempData) )
)
begin;
    -- Match - return the resultset
    select  d.TransactionId, d.ItemId, d.Code, d.EffectiveDate, d.CreateDate
    from    @tempData d
    order by d.rn;

end;

如果你想全部内联,你可以使用一种不同的方法,利用CTE和递归来执行运行总和/总和类似OrdPath的比较(尽管你仍然需要解析序列字符数据)进入数据集)

-- Sequence data with running total
with sequenceWithRunningTotal as
(
    -- Anchor
    select  s.sequence, s.Code, s.Code as runningTotal, cast(s.Code as varchar(8000)) as pth,
            sum(s.Code) over(partition by 1) as sumCode
    from    @sequence s
    where   s.sequence = 1
    -- Recurse
    union all
    select  s.sequence, s.Code, b.runningTotal + s.Code as runningTotal,
            b.pth + '.' + cast(s.Code as varchar(8000)) as pth,
            b.sumCode as sumCode
    from    @sequence s
    join    sequenceWithRunningTotal b
    on      s.sequence = b.sequence + 1
),
-- Source data with sequence value
srcData as 
(
    select  row_number() over(order by t.EffectiveDate) as rn,
            t.TransactionId, t.ItemId, t.Code, t.EffectiveDate, t.CreateDate,
            sum(t.Code) over(partition by 1) as sumCode
    from    #TableName t
    join    @sequence s
    on      t.Code = s.Code
    where   t.ItemId = @item_id
),
-- Source data with running sum
sourceWithRunningSum as
(
    -- Anchor
    select  t.rn, t.TransactionId, t.ItemId, t.Code, t.EffectiveDate, t.CreateDate,
            t.Code as runningTotal, cast(t.Code as varchar(8000)) as pth,
            t.sumCode
    from    srcData t
    where   t.rn = 1
    -- Recurse
    union all
    select  t.rn, t.TransactionId, t.ItemId, t.Code, t.EffectiveDate, t.CreateDate,
            s.runningTotal + t.Code as runningTotal,
            s.pth + '.' + cast(t.Code as varchar(8000)) as pth,
            t.sumCode
    from    srcData t
    join    sourceWithRunningSum s
    on      t.rn  = s.rn + 1
)
select  d.TransactionId, d.ItemId, d.Code, d.EffectiveDate, d.CreateDate
from    sourceWithRunningSum d
join    sequenceWithRunningTotal s
on      d.rn = s.sequence
and     d.Code = s.Code
and     d.runningTotal = s.runningTotal
and     d.pth = s.pth
and     d.sumCode = s.sumCode
order by d.rn;

答案 2 :(得分:0)

这只是我的头脑,并未经过测试,因此可能需要进行一些调整:

SELECT DISTINCT
     T.TransactionID,
     T.ItemID,
     T.Code,
     T.EffectiveDate,
     T.CreateDate
FROM
     My_Table T
INNER JOIN (
     SELECT
          T1.TransactionID,
          T2.TransactionID,
          T3.TransactionID
     FROM
          My_Table T1
     INNER JOIN My_Table T2 ON
          T2.ItemID = T1.ItemID AND
          T2.Code = 61 AND
          T2.EffectiveDate > T1.EffectiveDate
     INNER JOIN My_Table T3 ON
          T3.ItemID = T1.ItemID AND
          T3.Code = 9 AND
          T3.EffectiveDate > T2.EffectiveDate
     WHERE
          T1.Code = 51
     ) SQ ON
     SQ.TransactionID = T1.TransactionID OR
     SQ.TransactionID = T2.TransactionID OR
     SQ.TransactionID = T3.TransactionID