假设一个表格包含TransactionId,ItemId,Code,EffectiveDate和CreateDate字段。
+---------------+--------+------+------------------+------------------+ | TransactionId | ItemId | Code | EffectiveDate | CreateDate | +---------------+--------+------+------------------+------------------+ | 1| 1| 8| 12/2/2009 1:13 PM| 12/2/2009 1:13 PM| +---------------+--------+------+------------------+------------------+ | 4| 1| 51|12/2/2009 11:08 AM| 12/3/2009 9:01 AM| +---------------+--------+------+------------------+------------------+ | 2| 1| 14|12/2/2009 11:09 AM|12/2/2009 11:09 AM| +---------------+--------+------+------------------+------------------+ | 3| 1| 61| 12/3/2009 8:33 AM| 12/3/2009 8:33 AM| +---------------+--------+------+------------------+------------------+ | 5| 1| 28| 12/3/2009 9:33 AM| 12/3/2009 9:33 AM| +---------------+--------+------+------------------+------------------+ | 6| 1| 9| 12/3/2009 1:58 PM| 12/3/2009 1:58 PM| +---------------+--------+------+------------------+------------------+
我需要得到一组记录,其中序列51,61,9对于给定的ItemId发生,按EffectiveDate排序。在这些记录之间可能还有其他记录和其他代码。
在这种情况下,我会返回TransactionId的4,3和6,如下所示。
+---------------+--------+------+------------------+------------------+ | TransactionId | ItemId | Code | EffectiveDate | CreateDate | +---------------+--------+------+------------------+------------------+ | 4| 1| 51|12/2/2009 11:08 AM| 12/3/2009 9:01 AM| +---------------+--------+------+------------------+------------------+ | 3| 1| 61| 12/3/2009 8:33 AM| 12/3/2009 8:33 AM| +---------------+--------+------+------------------+------------------+ | 6| 1| 9| 12/3/2009 1:58 PM| 12/3/2009 1:58 PM| +---------------+--------+------+------------------+------------------+
请注意:
如果简单(即没有游标或过于复杂的存储过程),DB方法是理想的,但代码方法也可以工作,尽管它会导致大量数据传输出DB。
环境是SQL Server 2005和C#/ .NET 3.5。
答案 0 :(得分:1)
如果数据库方法很简单(即没有游标或过于复杂的存储过程),那么这种方法是理想的。
我不相信纯DB方法(“纯”意味着仅使用SQL SELECT)是实用的,因为我设想的SQL类型需要非常复杂的自连接,字段连接,MAX()函数等。在Joe Celko的“SQL for Smarties”一书中,SQL的类型可能是一个有趣的学术答案,但我认为这不适合生产代码。
我认为现实的方法是编写一种跟踪状态的循环。一般意义上的问题非常类似于编写用于状态检查TCPIP数据包的代码以进行垃圾邮件过滤或扫描欺诈模式的信用卡交易。所有这些问题都具有相似的特征:您对当前行(记录)所采取的操作取决于您之前看到的记录(上下文)......并且该方面需要保存状态变量。
如果要避免将数据往返进行分析,看起来Transact-SQL是性能的最佳方式。或者使用托管CLR来利用C#语法,同时仍然将处理保留在数据库引擎中。
答案 1 :(得分:1)
实际上,您可以利用ranking/windowing functions和/或CTEs和recursive CTEs获得一些相当简单的解决方案。
创建一个过程,接受基于字符的逗号分隔的代码值列表,这些代码值在您希望的序列中查找 - 使用由序列和代码值组成的dozen possible ways to split this list into a table/set中的任何一个,得到一个具有如下结构的表:
declare @sequence table (sequence int not null, Code int not null);
一旦你有了这个,只需要根据将序列表连接到源表上对给定ItemId的相同代码值对源集进行排序 - 一旦你对源集进行了过滤和排序,你就可以简单地基于匹配的序列值再次连接 - 这听起来很复杂,但实际上它将是这样的单个查询:
with srcData as (
select row_number() over(order by t.EffectiveDate) as rn,
t.TransactionId, t.ItemId, t.Code, t.EffectiveDate, t.CreateDate
from #TableName t
join @sequence s
on t.Code = s.Code
where t.ItemId = @item_id
)
select d.TransactionId, d.ItemId, d.Code, d.EffectiveDate, d.CreateDate
from srcData d
join @sequence s
on d.rn = s.sequence
and d.Code = s.Code
order by d.rn;
仅凭这一点并不能保证您获得的结果集与您要查找的结果集相同,但是将数据暂存到临时表中并在代码周围添加一些简单的检查就可以了(例如) ,添加校验和验证和代码值之和)
declare @tempData table (rn int, TransactionId smallint, ItemId smallint, Code smallint, EffectiveDate datetime, CreateDate datetime);
with srcData as (
select row_number() over(order by t.EffectiveDate) as rn,
t.TransactionId, t.ItemId, t.Code, t.EffectiveDate, t.CreateDate
from #TableName t
join @sequence s
on t.Code = s.Code
where t.ItemId = @item_id
)
insert @tempData
(rn, TransactionId, ItemId, Code, EffectiveDate, CreateDate)
select d.rn, d.TransactionId, d.ItemId, d.Code, d.EffectiveDate, d.CreateDate
from srcData d
join @sequence s
on d.rn = s.sequence
and d.Code = s.Code;
-- Verify we have matching hash/sums
if
(
( (select sum(Code) from @sequence) = (select sum(Code) from @tempData) )
and
( (select checksum_agg(checksum(sequence, Code)) from @sequence) = (select checksum_agg(checksum(rn, Code)) from @tempData) )
)
begin;
-- Match - return the resultset
select d.TransactionId, d.ItemId, d.Code, d.EffectiveDate, d.CreateDate
from @tempData d
order by d.rn;
end;
如果你想全部内联,你可以使用一种不同的方法,利用CTE和递归来执行运行总和/总和类似OrdPath的比较(尽管你仍然需要解析序列字符数据)进入数据集)
-- Sequence data with running total
with sequenceWithRunningTotal as
(
-- Anchor
select s.sequence, s.Code, s.Code as runningTotal, cast(s.Code as varchar(8000)) as pth,
sum(s.Code) over(partition by 1) as sumCode
from @sequence s
where s.sequence = 1
-- Recurse
union all
select s.sequence, s.Code, b.runningTotal + s.Code as runningTotal,
b.pth + '.' + cast(s.Code as varchar(8000)) as pth,
b.sumCode as sumCode
from @sequence s
join sequenceWithRunningTotal b
on s.sequence = b.sequence + 1
),
-- Source data with sequence value
srcData as
(
select row_number() over(order by t.EffectiveDate) as rn,
t.TransactionId, t.ItemId, t.Code, t.EffectiveDate, t.CreateDate,
sum(t.Code) over(partition by 1) as sumCode
from #TableName t
join @sequence s
on t.Code = s.Code
where t.ItemId = @item_id
),
-- Source data with running sum
sourceWithRunningSum as
(
-- Anchor
select t.rn, t.TransactionId, t.ItemId, t.Code, t.EffectiveDate, t.CreateDate,
t.Code as runningTotal, cast(t.Code as varchar(8000)) as pth,
t.sumCode
from srcData t
where t.rn = 1
-- Recurse
union all
select t.rn, t.TransactionId, t.ItemId, t.Code, t.EffectiveDate, t.CreateDate,
s.runningTotal + t.Code as runningTotal,
s.pth + '.' + cast(t.Code as varchar(8000)) as pth,
t.sumCode
from srcData t
join sourceWithRunningSum s
on t.rn = s.rn + 1
)
select d.TransactionId, d.ItemId, d.Code, d.EffectiveDate, d.CreateDate
from sourceWithRunningSum d
join sequenceWithRunningTotal s
on d.rn = s.sequence
and d.Code = s.Code
and d.runningTotal = s.runningTotal
and d.pth = s.pth
and d.sumCode = s.sumCode
order by d.rn;
答案 2 :(得分:0)
这只是我的头脑,并未经过测试,因此可能需要进行一些调整:
SELECT DISTINCT
T.TransactionID,
T.ItemID,
T.Code,
T.EffectiveDate,
T.CreateDate
FROM
My_Table T
INNER JOIN (
SELECT
T1.TransactionID,
T2.TransactionID,
T3.TransactionID
FROM
My_Table T1
INNER JOIN My_Table T2 ON
T2.ItemID = T1.ItemID AND
T2.Code = 61 AND
T2.EffectiveDate > T1.EffectiveDate
INNER JOIN My_Table T3 ON
T3.ItemID = T1.ItemID AND
T3.Code = 9 AND
T3.EffectiveDate > T2.EffectiveDate
WHERE
T1.Code = 51
) SQ ON
SQ.TransactionID = T1.TransactionID OR
SQ.TransactionID = T2.TransactionID OR
SQ.TransactionID = T3.TransactionID