我是T-SQL和窗口函数的新手。
我不解释为什么下面两个查询产生相同的结果:
SELECT
empid, ordermonth, val,
SUM(val) OVER (PARTITION BY empid ORDER BY ordermonth
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS runval
FROM
Sales.EmpOrders;
和
SELECT
empid, ordermonth, val,
SUM(val) OVER(PARTITION BY empid ORDER BY ordermonth) AS runval
FROM
Sales.EmpOrders;
输出相同:
第二个查询是否应为每个Empid产生相同的总值?还是ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
是默认值,当在over子句中使用order by时是可选的吗?
答案 0 :(得分:1)
如果您希望每个empid
都具有相同的值,请不要使用ORDER BY
:
SELECT empid, ordermonth, val,
SUM(val) OVER (PARTITION BY empid) AS runval
FROM Sales.EmpOrders;
否则,您的两个表达式相同-如果排序键是唯一的。 documentation中说明了默认值:
如果未指定ROWS / RANGE但指定了ORDER BY,则RANGE 窗口的默认值是“未绑定先行和当前行” 框架。
答案 1 :(得分:1)
对于连续总和(或类似数字),当两行之间的ORDER BY ...
中有平局时,则可见差异。考虑以下示例,其中员工在2006-09-01
上有两个订单:
DECLARE @T TABLE (empid INT, ordermonth DATE, val INT);
INSERT INTO @T VALUES
(1, '2006-07-01', 100),
(1, '2006-08-01', 100),
(1, '2006-09-01', 100),
(1, '2006-09-01', 100),
(1, '2006-10-01', 100);
SELECT empid, ordermonth, val,
runval_rows = SUM(val) OVER (PARTITION BY empid ORDER BY ordermonth ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW),
runval_auto = SUM(val) OVER (PARTITION BY empid ORDER BY ordermonth)
FROM @t
empid | ordermonth | val | runval_rows | runval_auto
1 | 2006-07-01 | 100 | 100 | 100
1 | 2006-08-01 | 100 | 200 | 200
1 | 2006-09-01 | 100 | 300* | 400*
1 | 2006-09-01 | 100 | 400* | 400*
1 | 2006-10-01 | 100 | 500 | 500
当未指定row / range子句时,SQL Server默认为:
如果未指定ROWS / RANGE但指定了ORDER BY,则RANGE 窗口的默认值是“未绑定先行和当前行” 框架。
用最简单的话来说,范围是分区内在ORDER BY
子句中指定的列中具有相同值的行的集合。因此,第二个变体将第3个和第4个视为相同范围的一部分,并在计算运行总和时将它们都包括在内。