让我们假设你有一个名为Table1 of Orders的表,它是按照行内UDF返回的时间顺序。请注意,OrderID可能不同步,因此我故意在那里创建了一个异常(即我没有包含Date字段,但如果您更容易,我可以访问该列。)
OrderID BuySell FilledSize ExecutionPrice RunningTotal AverageBookCost RealisedPnL
339 Buy 2 24.5 NULL NULL NULL
375 Sell 3 23.5 NULL NULL NULL
396 Sell 3 20.5 NULL NULL NULL
416 Sell 1 16.4 NULL NULL NULL
405 Buy 4 18.2 NULL NULL NULL
421 Sell 1 16.7 NULL NULL NULL
432 Buy 3 18.6 NULL NULL NULL
我有一个函数,我想从顶部到底部递归地应用,计算3个NULL列,但是函数的输入将是前一次调用的输出。我创建的函数名为mfCalc_RunningTotalBookCostPnL,我在下面附上了这个
CREATE FUNCTION [fMath].[mfCalc_RunningTotalBookCostPnL](
@BuySell VARCHAR(4),
@FilledSize DECIMAL(31,15),
@ExecutionPrice DECIMAL(31,15),
@OldRunningTotal DECIMAL(31,15),
@OldBookCost DECIMAL(31,15)
)
RETURNS @ReturnTable TABLE(
NewRunningTotal DECIMAL(31,15),
NewBookCost DECIMAL(31,15),
PreMultRealisedPnL DECIMAL(31,15)
)
AS
BEGIN
DECLARE @SignedFilledSize DECIMAL(31,15),
@NewRunningTotal DECIMAL(31,15),
@NewBookCost DECIMAL(31,15),
@PreMultRealisedPnL DECIMAL(31,15)
SET @SignedFilledSize = fMath.sfSignedSize(@BuySell, @FilledSize)
SET @NewRunningTotal = @OldRunningTotal + @SignedFilledSize
SET @PreMultRealisedPnL = 0
IF SIGN(@SignedFilledSize) = SIGN(@OldRunningTotal)
-- This Trade is adding to the existing position.
SET @NewBookCost = (@SignedFilledSize * @ExecutionPrice +
@OldRunningTotal * @OldBookCost) / (@NewRunningTotal)
ELSE
BEGIN
-- This trade is reversing the existing position.
-- This could be buying when short or selling when long.
DECLARE @AbsClosedSize DECIMAL(31,15)
SET @AbsClosedSize = fMath.sfMin(ABS(@SignedFilledSize), ABS(@OldRunningTotal));
-- There must be Crystalising of PnL.
SET @PreMultRealisedPnL = (@ExecutionPrice - @OldBookCost) * @AbsClosedSize * SIGN(-@SignedFilledSize)
-- Work out the NewBookCost
SET @NewBookCost = CASE
WHEN ABS(@SignedFilledSize) < ABS(@OldRunningTotal) THEN @OldBookCost
WHEN ABS(@SignedFilledSize) = ABS(@OldRunningTotal) THEN 0
WHEN ABS(@SignedFilledSize) > ABS(@OldRunningTotal) THEN @ExecutionPrice
END
END
-- Insert values into Return Table
INSERT INTO @ReturnTable
VALUES (@NewRunningTotal, @NewBookCost, @PreMultRealisedPnL)
-- Return
RETURN
END
所以我正在寻找的t-SQL命令(我不介意有人可以创建一个外部应用)会生成以下结果/解决方案集:
OrderID BuySell FilledSize ExecutionPrice RunningTotal AverageBookCost RealisedPnL
339 Buy 2 24.5 2 24.5 0
375 Sell 3 23.5 -1 23.5 -2
396 Sell 3 20.5 -4 21.25 0
416 Sell 1 16.4 -5 20.28 0
405 Buy 4 18.2 -1 20.28 8.32
421 Sell 1 16.7 -2 18.49 0
432 Buy 3 18.6 1 18.6 -0.29
一些注意事项,上面的存储过程调用了一个简单的函数fMath.sfSignedSize,它只是使('Sell',3)= -3。另外,为了避免疑问,我会看到解决方案按照这个顺序进行这些调用,假设我在计算中是正确的! (请注意,我开始假设OldRunningTotal和OldBookCost都为零):
SELECT * FROM fMath.mfCalc_RunningTotalBookCostPnL('Buy',2,24.5,0,0)
SELECT * FROM fMath.mfCalc_RunningTotalBookCostPnL('Sell',3,23.5,2,24.5)
SELECT * FROM fMath.mfCalc_RunningTotalBookCostPnL('Sell',3,20.5,-1,23.5)
SELECT * FROM fMath.mfCalc_RunningTotalBookCostPnL('Sell',1,16.4,-4,21.25)
SELECT * FROM fMath.mfCalc_RunningTotalBookCostPnL('Buy',4,18.2,-5,20.28)
SELECT * FROM fMath.mfCalc_RunningTotalBookCostPnL('Sell',1,16.7,-1,20.28)
SELECT * FROM fMath.mfCalc_RunningTotalBookCostPnL('Buy',3,18.6,-2,18.49)
显然,[fMath]。[mfCalc_RunningTotalBookCostPnL]可能需要调整,以便它可以以NULL条目作为OldRunningTotal和OldBookCost开始,但这很简单。应用复原性质的SQL集理论有点难度。
非常感谢, 贝尔蒂。
答案 0 :(得分:4)
create table Test(
OrderID int primary key,
Qty int not null
);
declare @i int = 1;
while @i <= 5000 begin
insert into Test(OrderID, Qty) values (@i * 2,rand() * 10);
set @i = @i + 1;
end;
递归解决方案需要9秒时间:
with T AS
(
select ROW_NUMBER() over(order by OrderID) as rn, * from test
)
,R(Rn, OrderId, Qty, RunningTotal) as
(
select Rn, OrderID, Qty, Qty
from t
where rn = 1
union all
select t.Rn, t.OrderId, t.Qty, p.RunningTotal + t.Qty
from t t
join r p on t.rn = p.rn + 1
)
select R.OrderId, R.Qty, R.RunningTotal from r
option(maxrecursion 0);
UPDATE表需要0秒:
create function TestRunningTotal()
returns @ReturnTable table(
OrderId int, Qty int, RunningTotal int
)
as begin
insert into @ReturnTable(OrderID, Qty, RunningTotal)
select OrderID, Qty, 0 from Test
order by OrderID;
declare @RunningTotal int = 0;
update @ReturnTable set
RunningTotal = @RunningTotal,
@RunningTotal = @RunningTotal + Qty;
return;
end;
这两种方法至少可以为您提供构建查询的框架。
在SQL Server中BTW,与MySQL不同,变量赋值的顺序无关紧要。这样:
update @ReturnTable set
RunningTotal = @RunningTotal,
@RunningTotal = @RunningTotal + Qty;
以下内容:
update @ReturnTable set
@RunningTotal = @RunningTotal + Qty,
RunningTotal = @RunningTotal;
它们都以相同的方式执行,即变量赋值首先发生,而不管变量赋值在语句中的位置如何。两个查询都有相同的输出:
OrderId Qty RunningTotal
----------- ----------- ------------
2 4 4
4 8 12
6 4 16
8 5 21
10 3 24
12 8 32
14 2 34
16 9 43
18 1 44
20 2 46
22 0 46
24 2 48
26 6 54
在您确切的表格上,只需检测买入/卖出,您可以将它们分别乘以1和-1,或者只是对字段进行签名,例如:
update @ReturnTable set
@RunningTotal = @RunningTotal +
CASE WHEN BuySell = 'Buy' THEN Qty ELSE -Qty END,
RunningTotal = @RunningTotal;
如果您碰巧升级到SQL Server 2012,这是直接实现运行总计:
select OrderID, Qty, sum(Qty) over(order by OrderID) as RunningTotal
from Test
关于你的确切问题:
select OrderID, Qty,
sum(CASE WHEN BuySell = 'Buy' THEN Qty ELSE -Qty END)
over(order by OrderID) as RunningTotal
from Test;
<强>更新强>
如果您对quirky update感到不安,可以使用一个保护条款来检查要更新的行的顺序是否与原始顺序匹配(由身份(1,1)帮助):
create function TestRunningTotalGuarded()
returns @ReturnTable table(
OrderId int, Qty int,
RunningTotal int not null,
RN int identity(1,1) not null
)
as begin
insert into @ReturnTable(OrderID, Qty, RunningTotal)
select OrderID, Qty, 0 from Test
order by OrderID;
declare @RunningTotal int = 0;
declare @RN_check INT = 0;
update @ReturnTable set
@RN_check = @RN_check + 1,
@RunningTotal =
(case when RN = @RN_check then @RunningTotal + Qty else 1/0 end),
RunningTotal = @RunningTotal;
return;
end;
如果UPDATE确实以不可预测的顺序更新行(或者任何可能的话),则@RN_Check将不再等于RN(身份顺序),代码将引发被零除错误< / b>然后。使用保护条款,不可预测的更新顺序将fail fast;如果发生这种情况,那么现在是时候向微软提交一份 bug 请愿书,以使奇怪的更新不那么古怪: - )
对固有命令性操作(变量赋值)的保护条款对冲实际上是顺序的。
答案 1 :(得分:2)
在没有完全正常运行的[fMath]。[mfCalc_RunningTotalBookCostPnL]进行测试时,这是一个黑暗中的刺。我在测试之前第一次获得递归CTE的记录只有大约50%,但即使不完美,如果我正确理解你的要求,它应该足以让你开始:
-- First, cache Table1 into #temp to improve recursive CTE performance
select
RowNum=ROW_NUMBER()OVER(ORDER BY OrderID)
, *
INTO #temp
FROM Table1;
GO
; WITH CTE (RowNum,OrderID, BuySell, FilledSize, ExecutionPrice, RunningTotal, AverageBookCost, RealisedPnL) AS (
SELECT RowNum,OrderID, BuySell, FilledSize, ExecutionPrice, RunningTotal=0, AverageBookCost=0, RealisedPnL=0
FROM #temp
WHERE RowNum=1
UNION ALL
SELECT t.RowNum, t.OrderID, t.BuySell, t.FilledSize, t.ExecutionPrice
, RunningTotal=c.NewRunningTotal, AverageBookCost=c.NewBookCost, RealisedPnL=c.PreMultRealisedPnL
FROM #temp t
INNER JOIN CTE ON CTE.RowNum+1 = t.RowNum
CROSS APPLY [fMath].[mfCalc_RunningTotalBookCostPnL](t.BuySell, t.FilledSize, t.ExecutionPrice, CTE.RunningTotal, CTE.AverageBookCost) AS c
)
SELECT OrderID, BuySell, FilledSize, ExecutionPrice, RunningTotal, AverageBookCost, RealisedPnL
FROM CTE
/* Replace the above SELECT with the following after testing ok
UPDATE tab
SET RunningTotal=CTE.RunningTotal
, AverageBookCost=CTE.AverageBookCost
, RealisedPnL=CTE.RealisedPnL
FROM Table1 tab
INNER JOIN CTE on CTE.OrderID=tab.OrderID
*/
OPTION (MAXRECURSION 32767);
GO
-- clean up
DROP TABLE #temp
GO
还有一个免责声明 - 递归CTE最大深度为32767.如果限制太多,您需要探索不同的方法,或者对数据集进行某种窗口化。
答案 2 :(得分:0)
我重新创建了运行的总查询以包含一个分区(在客户上)
CTE方法:
with T AS
(
select
ROW_NUMBER() over(partition by CustomerCode order by OrderID) as rn, *
from test
)
,R(CustomerCode, Rn, OrderId, Qty, RunningTotal) as
(
select CustomerCode, Rn, OrderID, Qty, Qty
from t
where rn = 1
union all
select t.CustomerCode, t.Rn, t.OrderId, t.Qty, p.RunningTotal + t.Qty
from t t
join r p on p.CustomerCode = t.CustomerCode and t.rn = p.rn + 1
)
select R.CustomerCode, R.OrderId, R.Qty, R.RunningTotal from r
order by R.CustomerCode, R.OrderId
option(maxrecursion 0);
古怪的更新方法:
create function TestRunningTotalGuarded()
returns @ReturnTable table(
CustomerCode varchar(50), OrderId int, Qty int,
RunningTotal int not null, RN int identity(1,1) not null
)
as begin
insert into @ReturnTable(CustomerCode, OrderID, Qty, RunningTotal)
select CustomerCode, OrderID, Qty, 0 from Test
order by CustomerCode, OrderID;
declare @RunningTotal int;
declare @RN_check INT = 0;
declare @PrevCustomerCode varchar(50) = NULL;
update @ReturnTable set
@RN_check = @RN_check + 1,
@RunningTotal =
(case when RN = @RN_check then
case when @PrevCustomerCode = CustomerCode then
@RunningTotal + Qty
else
Qty
end
else
1/0
end),
@PrevCustomerCode = CustomerCode,
RunningTotal = @RunningTotal;
return;
end;
Cursor方法(压缩代码以删除滚动条)
create function TestRunningTotalCursor()
returns @ReturnTable table(CustomerCode varchar(50), OrderId int,
Qty int, RunningTotal int not null) as
begin
declare @c_CustomerCode varchar(50);
declare @c_OrderID int;
declare @c_qty int;
declare @PrevCustomerCode varchar(50) = null;
declare @RunningTotal int = 0;
declare o_cur cursor for
select CustomerCode, OrderID, Qty from Test order by CustomerCode, OrderID;
open o_cur;
fetch next from o_cur into @c_CustomerCode, @c_OrderID, @c_Qty;
while @@FETCH_STATUS = 0 begin
if @c_CustomerCode = @PrevCustomerCode begin
set @RunningTotal = @RunningTotal + @c_qty;
end else begin
set @RunningTotal = @c_Qty;
end;
set @PrevCustomerCode = @c_CustomerCode;
insert into @ReturnTable(CustomerCode, OrderId, Qty, RunningTotal)
values(@c_CustomerCode, @c_OrderID, @c_Qty, @RunningTotal);
fetch next from o_cur into @c_CustomerCode, @c_OrderID, @c_Qty;
end;
close o_cur; deallocate o_cur; return;
end;
5,000行的指标:
* Recursive CTE : 49 seconds
* Quirky Update : 0 second
* Cursor : 0 second
那些0秒没有意义。将行数提高到50,000后,以下是指标:
* Quirky Update : 1 second
* Cursor : 3 second
* Recursive CTE : An hour
警告,我found out奇怪的更新非常古怪,有时它可以工作,有时它不会(通过查询的五分之一运行中存在被零除错误来表示)。 / p>
这是数据的DDL:
create table Test(
OrderID int primary key,
CustomerCode varchar(50),
Qty int not null
);
declare @i int = 1;
while @i <= 20 begin
insert into Test(OrderID, CustomerCode, Qty) values (
@i * 2
,case @i % 4
when 0 then 'JOHN'
when 1 then 'PAUL'
when 2 then 'GEORGE'
when 3 then 'RINGO'
end
,rand() * 10);
set @i = @i + 1;
end;
<强>更新强>
显然,纯CTE方法并不好。必须使用混合方法。当行编号具体化为实际表格时,速度会上升
select ROW_NUMBER() over(partition by CustomerCode order by OrderID) as rn, * into #xxx
from test;
with T AS
(
select * from #xxx
)
,R(CustomerCode, Rn, OrderId, Qty, RunningTotal) as
(
select CustomerCode, Rn, OrderID, Qty, Qty
from t
where rn = 1
union all
select t.CustomerCode, t.Rn, t.OrderId, t.Qty, p.RunningTotal + t.Qty
from t t
join r p on p.CustomerCode = t.CustomerCode and t.rn = p.rn + 1
)
select R.CustomerCode, R.OrderId, R.Qty, R.RunningTotal from r
order by R.CustomerCode, R.OrderId
option(maxrecursion 0);
drop table #xxx;
回顾一下,以下是将纯CTE转换为使用物化行编号之前的指标(行编号结果在实际表中,即在临时表中)
* Quirky Update : 1 second
* Cursor : 3 second
* Recursive CTE(Pure) : An hour
将行编号实现到临时表之后:
* Quirky Update : 1 second
* Cursor : 3 second
* Recursive CTE(Hybrid) : 2 second (inclusive of row numbering table materialization)
混合递归CTE方法实际上比光标方法更快。
<小时/> 另一个更新
只需将群集主键放在顺序列上,UPDATE就会在其物理顺序上更新行。不再发生被零除(检测非顺序更新的保护条款)。 e.g。
alter function TestRunningTotalGuarded()
returns @ReturnTable table(
CustomerCode varchar(50), OrderId int, Qty int,
RunningTotal int not null,
RN int identity(1,1) not null primary key clustered
)
我试过运行古怪的更新(使用群集主键)100次,如果可能有角落情况,我到目前为止找不到。我没有遇到任何被零除错误。阅读此博客文章底部的结论:http://www.ienablemuch.com/2012/05/recursive-cte-is-evil-and-cursor-is.html
即使使用群集主键,它仍然很快。
以下是100,000行的指标:
Quirky Update : 3 seconds
Hybrid Recursive CTE : 5 seconds
Cursor : 6 seconds
奇怪的更新(毕竟不是那么古怪)仍然很快。它比混合递归CTE更快。