Question

我希望用累积总和更新表格。更新必须按特定顺序进行，否则累积值是错误的。

考虑一个包含以下列的表Trans：

TransID, TransDate, Amount, Balance

TransDate列仅包含日期而非时间。因此，同一天可能有许多交易，然后TransID确定正确的订单。

这是我想要做的：

Declare @Total as int = 0
Update Trans Set @Total=@Total+Amount,Balance=@Total Order By TransDate,TransID

由于Order By子句，SQL无效。如果我删除Order By子句，只有在按顺序输入所有事务时它才会起作用。

我试过搜索其他帖子但找不到满意的答案。我唯一的另一个选择是创建一个SP并使用游标逐步执行每个事务并逐个更新。

有什么想法吗？

Answer 1

[EDIT3]

使用游标执行此操作是我在SQL 2008中找到的最佳方法。在我的低端XP 32位虚拟机上，4GB内存（SSD tho'）我在60秒内处理了500k记录。

有些人，下面的博士......

我一直回到这一点，因为我过去经常不得不处理它，但我认为我没有足够的时间来探索更好的选择，而不是第一个执行并获得正确答案的解决方案。

我现在意识到我应该花时间。我在下面提供的子查询方法很容易编写，但对性能来说绝对是最糟糕的事情。

由于游标通常也不受欢迎，我尝试使用递归CTE作为第一种选择。它可以很好地扩展到大型记录集，但有两个缺点：

最大递归次数为32767.如果您有32768条记录，则必须将任务分解为块并循环遍历它们。凌乱。
CTE方法需要按顺序排序的密钥，没有间隙。这个永远不会在生产中发生，所以我不得不自己滚动，这需要使用#tmp表格，然后更新回实时表格。完全没必要，特别是考虑到第1点。

这是一个“不惜一切代价避免游标”的例子。 SQL 2012确实有新方法，但我不会跑出去购买Win 7许可证只是为了试用它们......

[EDIT2]

我今天早上意识到依赖TransDate和TransID始终按顺序排序是一个坏主意。现实世界充满了过时的交易和/或ID值，它们不一定按顺序递增。在这种情况下，原始答案中的代码将会中断。所以我修改如下：

create table #tmp (
    TransID int identity(1,1) not null, 
    TransDate datetime not null, 
    Amount money null, 
    Balance money null,
    CompositeKey bigint null,
    primary key clustered (TransID)
);

insert into #tmp (TransDate, Amount)
select '2014-02-12', 100
union all
select '2014-02-12', 56
union all
select '2014-02-12', 38
union all
select '2014-02-12', 350
union all
select '2014-02-12', 980
union all
select '2014-02-13', 25
union all
select '2014-02-13', 80
union all
select '2014-02-13', 45
union all
select '2014-02-13', 269
union all
select '2014-02-11', 10000 -- this is an out-of-sequence record which breaks the original code

declare @maxID int = (select MAX(TransID) from #tmp)
set @maxID = power(10,LEN(@maxId))

update #tmp
set CompositeKey = CAST(TransDate as bigint) * @maxID + TransID

create nonclustered index IX_#tmp_CompositeKey
    on #tmp (CompositeKey);

update t1
set t1.Balance = t2.Balance
from #tmp as t1
left join (
    select t.TransID, t.TransDate, t.Amount, 
        (
            select sum(Amount) as Balance
            from #tmp as s
            --where s.TransDate <= t.TransDate and s.TransID <= t.TransID -- this gives an improper running balance
            where s.CompositeKey <= t.CompositeKey -- this gives the proper running balance
        ) as Balance
    from #tmp as t
) as t2
    on t1.TransDate = t2.TransDate and t1.TransId = t2.TransId

select *
from #tmp
order by TransDate, TransID

drop table #tmp

[开始原始答案]

Google上“sql server running sum”的最高点是：

Calculate a Running Total in SQL Server

它包含几种深入的治疗方法，并附有各种表现的说明。游标是一种提到性能更高的方法，尽管我理解你使用它们的沉默。这些方法都不够吗？

[编辑]

我对子查询很满意：

create table #tmp (
TransID int identity(1,1) not null, 
TransDate datetime not null, 
Amount money null, 
Balance money null,
primary key clustered (TransDate, TransID)
);

insert into #tmp (TransDate, Amount)
select '2014-02-12', 100
union all
select '2014-02-12', 56
union all
select '2014-02-12', 38
union all
select '2014-02-12', 350
union all
select '2014-02-12', 980
union all
select '2014-02-13', 25
union all
select '2014-02-13', 80
union all
select '2014-02-13', 45
union all
select '2014-02-13', 269
union all
select '2014-02-13', 42

update t1
set t1.Balance = t2.Balance
from #tmp as t1
left join (
    select t.TransID, t.TransDate, t.Amount, 
        (
            select sum(Amount) as Balance
            from #tmp as s
            where s.TransDate <= t.TransDate and s.TransID <= t.TransID
        ) as Balance
    from #tmp as t
) as t2
    on t1.TransDate = t2.TransDate and t1.TransId = t2.TransId

select *
from #tmp
order by TransDate, TransID

这是我能想到的最简单的方法。在大型记录集中，这种方法可能陷入困境，因此我对主键的具体设计。在超大型记录集中，对于这种东西，我使用索引字段将其分解为块（TransDate将是此处的索引候选者），然后使用游标或动态生成的SQL迭代更新。

CTE是参考文章中流行的另一种方法，但是编写递归CTE会破坏我的大脑，而且性能提升对我来说根本不清楚。

Answer 2

这是一种方法：

DECLARE @t table (
   transid   int identity(9,37)
 , transdate datetime
 , amount    decimal(15,4)
 , balance   decimal(15,4)
);

INSERT INTO @t (transdate, amount)
  VALUES ('2014-02-12', 100)
       , ('2014-02-12',  56)
       , ('2014-02-12',  38)
       , ('2014-02-12', 350)
       , ('2014-02-12', 980)
       , ('2014-02-13',  25)
       , ('2014-02-13',  80)
       , ('2014-02-13',  45)
       , ('2014-02-13', 269)
       , ('2014-02-13',  42)
;

; WITH x AS (
  SELECT transid
       , amount
       , Row_Number() OVER (ORDER BY transdate) As sequence
  FROM   @t
)
SELECT x.transid
     , x.amount
     , x.sequence
     , Sum(prev.amount) As running_sum
FROM   x
 LEFT
  JOIN x As prev
    ON prev.sequence <= x.sequence
GROUP
    BY x.transid
     , x.amount
     , x.sequence

这里的想法是，您使用窗口函数Row_Number()为序列中的每一行提供一个序号。这允许基于此排序连接到“先前”行的能力。

如果删除分组，您会注意到x.sequence = 1只有一行。对于x.sequence = 2，有两行（prev.sequence 1和2），依此类推。

因此，当结果集变大时，该方法效率不高。

SQL - 按顺序更新累积总和

2 个答案: