我试图在SELECT语句中根据同一SELECT语句中n个前一行中的列创建一个运行平均列。我需要的平均值基于结果集中的前n行。
让我解释一下
Id Number Average
1 1 NULL
2 3 NULL
3 2 NULL
4 4 2 <----- Average of (1, 3, 2),Numbers from previous 3 rows
5 6 3 <----- Average of (3, 2, 4),Numbers from previous 3 rows
. . .
. . .
“平均”列的前3行为空,因为之前没有行。 “平均值”列中的第4行显示前3行中“数字”列的平均值。
我需要一些帮助来尝试构建一个可以执行此操作的SQL Select语句。
答案 0 :(得分:11)
这应该这样做:
--Test Data
CREATE TABLE RowsToAverage
(
ID int NOT NULL,
Number int NOT NULL
)
INSERT RowsToAverage(ID, Number)
SELECT 1, 1
UNION ALL
SELECT 2, 3
UNION ALL
SELECT 3, 2
UNION ALL
SELECT 4, 4
UNION ALL
SELECT 5, 6
UNION ALL
SELECT 6, 8
UNION ALL
SELECT 7, 10
--The query
;WITH NumberedRows
AS
(
SELECT rta.*, row_number() OVER (ORDER BY rta.ID ASC) AS RowNumber
FROM RowsToAverage rta
)
SELECT nr.ID, nr.Number,
CASE
WHEN nr.RowNumber <=3 THEN NULL
ELSE ( SELECT avg(Number)
FROM NumberedRows
WHERE RowNumber < nr.RowNumber
AND RowNumber >= nr.RowNumber - 3
)
END AS MovingAverage
FROM NumberedRows nr
答案 1 :(得分:8)
假设Id列是顺序的,这里是一个名为“MyTable”的表的简化查询:
SELECT
b.Id,
b.Number,
(
SELECT
AVG(a.Number)
FROM
MyTable a
WHERE
a.id >= (b.Id - 3)
AND a.id < b.Id
AND b.Id > 3
) as Average
FROM
MyTable b;
答案 2 :(得分:2)
简单的自连接似乎比引用子查询的行
要好得多生成10k行测试数据:
drop table test10k
create table test10k (Id int, Number int, constraint test10k_cpk primary key clustered (id))
;WITH digits AS (
SELECT 0 as Number
UNION SELECT 1
UNION SELECT 2
UNION SELECT 3
UNION SELECT 4
UNION SELECT 5
UNION SELECT 6
UNION SELECT 7
UNION SELECT 8
UNION SELECT 9
)
,numbers as (
SELECT
(thousands.Number * 1000)
+ (hundreds.Number * 100)
+ (tens.Number * 10)
+ ones.Number AS Number
FROM digits AS ones
CROSS JOIN digits AS tens
CROSS JOIN digits AS hundreds
CROSS JOIN digits AS thousands
)
insert test10k (Id, Number)
select Number, Number
from numbers
我会从主查询中拉出前3行的特殊情况,如果你真的想要它在行集中,你可以UNION ALL。自联接查询:
;WITH NumberedRows
AS
(
SELECT rta.*, row_number() OVER (ORDER BY rta.ID ASC) AS RowNumber
FROM test10k rta
)
SELECT nr.ID, nr.Number,
avg(trailing.Number) as MovingAverage
FROM NumberedRows nr
join NumberedRows as trailing on trailing.RowNumber between nr.RowNumber-3 and nr.RowNumber-1
where nr.Number > 3
group by nr.id, nr.Number
在我的机器上,这需要大约10秒,Aaron Alton演示的子查询方法大约需要45秒(在我更改它以反映我的测试源表之后):
;WITH NumberedRows
AS
(
SELECT rta.*, row_number() OVER (ORDER BY rta.ID ASC) AS RowNumber
FROM test10k rta
)
SELECT nr.ID, nr.Number,
CASE
WHEN nr.RowNumber <=3 THEN NULL
ELSE ( SELECT avg(Number)
FROM NumberedRows
WHERE RowNumber < nr.RowNumber
AND RowNumber >= nr.RowNumber - 3
)
END AS MovingAverage
FROM NumberedRows nr
如果执行SET STATISTICS PROFILE,则可以看到自联接在表假脱机上执行了10k。子查询在过滤器,聚合和其他步骤上执行了10k。
答案 3 :(得分:1)
编辑:我错过了它应该平均前三个记录......
对于一般的平均运行,我认为这样的事情会起作用:
SELECT
id, number,
SUM(number) OVER (ORDER BY ID) /
ROW_NUMBER() OVER (ORDER BY ID) AS [RunningAverage]
FROM myTable
ORDER BY ID
答案 4 :(得分:0)
查看一些解决方案here。我相信你可以很容易地调整其中一个。
答案 5 :(得分:0)
如果您希望这一点真正具有高性能,并且不害怕深入研究SQL Server的一个很少使用的领域,那么您应该考虑编写自定义聚合函数。 SQL Server 2005和2008将CLR集成带到了表中,包括编写用户聚合函数的能力。到目前为止,自定义运行总计聚合是计算此类运行平均值的最有效方法。
答案 6 :(得分:0)
或者,您可以非规范化并存储预先计算的运行值。这里描述:
选择的性能和它一样快。当然,修改速度较慢。