SQL Select语句用于计算运行平均列

时间:2009-05-26 15:32:07

标签: sql-server tsql sql-server-2008 aggregate

我试图在SELECT语句中根据同一SELECT语句中n个前一行中的列创建一个运行平均列。我需要的平均值基于结果集中的前n行。

让我解释一下

Id        Number       Average
 1             1          NULL
 2             3          NULL
 3             2          NULL
 4             4             2 <----- Average of (1, 3, 2),Numbers from previous 3 rows
 5             6             3 <----- Average of (3, 2, 4),Numbers from previous 3 rows
 .             .             .
 .             .             .

“平均”列的前3行为空,因为之前没有行。 “平均值”列中的第4行显示前3行中“数字”列的平均值。

我需要一些帮助来尝试构建一个可以执行此操作的SQL Select语句。

7 个答案:

答案 0 :(得分:11)

这应该这样做:

--Test Data
CREATE TABLE    RowsToAverage
    (
    ID int NOT NULL,
    Number int NOT NULL
    )

INSERT  RowsToAverage(ID, Number)
SELECT  1, 1
UNION ALL
SELECT  2, 3
UNION ALL
SELECT  3, 2
UNION ALL
SELECT  4, 4
UNION ALL
SELECT  5, 6
UNION ALL
SELECT  6, 8
UNION ALL
SELECT  7, 10

--The query
;WITH   NumberedRows
AS
(
SELECT  rta.*, row_number() OVER (ORDER BY rta.ID ASC) AS RowNumber
FROM    RowsToAverage rta
)

SELECT  nr.ID, nr.Number,
        CASE
            WHEN nr.RowNumber <=3 THEN NULL
            ELSE (  SELECT  avg(Number) 
                    FROM    NumberedRows 
                    WHERE   RowNumber < nr.RowNumber
                    AND     RowNumber >= nr.RowNumber - 3
                )
        END AS MovingAverage
FROM    NumberedRows nr

答案 1 :(得分:8)

假设Id列是顺序的,这里是一个名为“MyTable”的表的简化查询:

SELECT 
    b.Id,
    b.Number,
    (
      SELECT 
       AVG(a.Number) 
      FROM 
       MyTable a 
     WHERE 
       a.id >= (b.Id - 3) 
       AND a.id < b.Id
       AND b.Id > 3 
     ) as Average
FROM 
    MyTable b;

答案 2 :(得分:2)

简单的自连接似乎比引用子查询的行

要好得多

生成10k行测试数据:

drop table test10k
create table test10k (Id int, Number int, constraint test10k_cpk primary key clustered (id))

;WITH digits AS (
    SELECT 0 as Number
    UNION SELECT 1
    UNION SELECT 2
    UNION SELECT 3
    UNION SELECT 4
    UNION SELECT 5
    UNION SELECT 6
    UNION SELECT 7
    UNION SELECT 8
    UNION SELECT 9
)
,numbers as (
    SELECT 
        (thousands.Number * 1000) 
        + (hundreds.Number * 100) 
        + (tens.Number * 10) 
        + ones.Number AS Number
    FROM digits AS ones 
    CROSS JOIN digits AS tens
    CROSS JOIN digits AS hundreds
    CROSS JOIN digits AS thousands
)
insert test10k (Id, Number)
select Number, Number
from numbers 

我会从主查询中拉出前3行的特殊情况,如果你真的想要它在行集中,你可以UNION ALL。自联接查询:

;WITH   NumberedRows
AS
(
    SELECT  rta.*, row_number() OVER (ORDER BY rta.ID ASC) AS RowNumber
    FROM    test10k rta
)

SELECT  nr.ID, nr.Number,
    avg(trailing.Number) as MovingAverage
FROM    NumberedRows nr
    join NumberedRows as trailing on trailing.RowNumber between nr.RowNumber-3 and nr.RowNumber-1
where nr.Number > 3
group by nr.id, nr.Number

在我的机器上,这需要大约10秒,Aaron Alton演示的子查询方法大约需要45秒(在我更改它以反映我的测试源表之后):

;WITH   NumberedRows
AS
(
    SELECT  rta.*, row_number() OVER (ORDER BY rta.ID ASC) AS RowNumber
    FROM    test10k rta
)
SELECT  nr.ID, nr.Number,
    CASE
            WHEN nr.RowNumber <=3 THEN NULL
            ELSE (  SELECT  avg(Number) 
                            FROM    NumberedRows 
                            WHERE   RowNumber < nr.RowNumber
                            AND             RowNumber >= nr.RowNumber - 3
                    )
    END AS MovingAverage
FROM    NumberedRows nr

如果执行SET STATISTICS PROFILE,则可以看到自联接在表假脱机上执行了10k。子查询在过滤器,聚合和其他步骤上执行了10k。

答案 3 :(得分:1)

编辑:我错过了它应该平均前三个记录......

对于一般的平均运行,我认为这样的事情会起作用:

SELECT
    id, number, 
    SUM(number) OVER (ORDER BY ID) / 
       ROW_NUMBER() OVER (ORDER BY ID) AS [RunningAverage]
FROM myTable
ORDER BY ID

答案 4 :(得分:0)

查看一些解决方案here。我相信你可以很容易地调整其中一个。

答案 5 :(得分:0)

如果您希望这一点真正具有高性能,并且不害怕深入研究SQL Server的一个很少使用的领域,那么您应该考虑编写自定义聚合函数。 SQL Server 2005和2008将CLR集成带到了表中,包括编写用户聚合函数的能力。到目前为止,自定义运行总计聚合是计算此类运行平均值的最有效方法。

答案 6 :(得分:0)

或者,您可以非规范化并存储预先计算的运行值。这里描述:

http://sqlblog.com/blogs/alexander_kuznetsov/archive/2009/01/23/denormalizing-to-enforce-business-rules-running-totals.aspx

选择的性能和它一样快。当然,修改速度较慢。