Question

我想加速以下（PostgreSQL）代码，我推测它可以帮助摆脱（某些）循环，但我没有看到这样做的方法。欢迎任何关于加速的建议。提前谢谢！

代码计算不同部分的每个列的一些统计数据（平均值，斜率）。该部分由滑动时间窗口（例如60分钟）确定。所以下面的代码

循环遍历不同的列，我有兴趣计算他们的统计数据

对于每一列，我按顺序移动时间窗口并计算该窗口中值的统计信息。

for col_name in ..... a list of column names
truncate small_table;           -- where statistics are temporarily stored
for cur in select time from big_table loop
    execute 'select regr_slope('|| col_name ||', time) as slope,'
        || ' avg(' || col_name || ') as mean'
        || ' from big_table where'
        || ' time <=' || cur.time
        || ' and time >=' || cur.time-60
        into result;

    execute 'insert into small_table values($1,$2,$3)'
        using cur.time, result.slope, result.mean;
end loop;

execute 'update big_table set '
    || col_name || '_slope = small_table.slope, '
    || col_name || '_mean = small_table.mean '
    || ' where big_table.time=small_table.time';
end loop;

引入结果的

small_table是为了避免big_table上的多次更新而引入的。

实际上有很多专栏（~50），这可能是另一个减速因素？

Answer 1

如果动态生成以下SQL模式，则至少可以在一个查询中执行所有这些操作。我不确定它是否会表现更好（显然你需要遍历所有列并添加它们）。在我担心在代码中构建SQL之前，我会测试性能。

Update
    big_table b
Set
    field1_slope = x.field1_slope,
    field1_mean = x.field1_mean,
    field2_slope = x.field2_slope,
    field2_mean = x.field2_mean
From (
    Select
        b1.time,
        regr_slope(b2.field1, b2.time) field1_slope,
        avg(b2.field1) field1_mean,
        regr_slope(b2.field2, b2.time) field2_slope,
        avg(b2.field2) field2_mean
    From
        big_table b1
            Inner Join
        big_table b2
            On b2.time >= b1.time and b2.time < b1.time + 60
    Group By
        b1.time
    ) x
Where
    b.time = x.time;

我对PostgreSQL并不过分熟悉，可能有办法消除对大表的引用之一。

Example SQL Fiddle

Another way with cursors

避免循环计算列上的运行平均值/统计数据

1 个答案: