For循环使用滞后更新列并根据条件删除行变得非常缓慢

时间:2019-11-21 05:25:12

标签: postgresql

我正在创建一个for循环函数,以更新根据当前行和上一行的值计算出的多列(时间,距离和速度),并删除更新后的列(速度)中的值超过临界值的行。 。该示例表具有约50万条记录,但是需要花费数小时才能执行,但仍未完成。索引,work_mem,fillfactor和vacuum full不会进行重大更改。下面是我想出的功能。

create or replace function speed_cal_cutoff()
returns void as
$body$
declare
   t_curs cursor for 
      select "id", loggerid, datecon, timecon, 
      time_interval, gcs_distance, gcs_geom, 
      interval_seconds, calculated_speed from mytable;
begin
  for t_row in t_curs
    loop

        update mytable
            set time_interval = (concat(datecon|| ' ' ||timecon)::timestamp) - prev_datetime 
            from (select "id", 
                  lag(loggerid) over (partition by loggerid order by datecon, timecon) as prev_loggerid, 
                  lag(concat(datecon|| ' ' ||timecon)::timestamp) over (partition by loggerid order by loggerid, datecon, timecon) as prev_datetime
                  from mytable) as subquery
        where mytable."id" = subquery.id 
            and mytable.loggerid = subquery.prev_loggerid
            and mytable."id" = t_row.id; 

        update mytable
            set gcs_distance = subquery.gcs_distance
            from (select "id", ST_Distance(gcs_geom::geography, lag(gcs_geom::geography) over (partition by loggerid order by loggerid, datecon, timecon asc)) as gcs_distance, 
                  lag(loggerid) over (partition by loggerid order by datecon, timecon) as prev_loggerid
                  from mytable) as subquery
        where mytable."id" = subquery.id
            and mytable.loggerid = subquery.prev_loggerid
            and mytable."id" = t_row.id; 

        update mytable
            set interval_seconds = (extract(EPOCH from time_interval))
        where mytable."id" = t_row.id; 

        update mytable  
            set calculated_speed = gcs_distance/interval_seconds
        where mytable."id" = t_row.id;

        delete from mytable where calculated_speed > 41.6667 
            and mytable."id" = t_row.id; 

  end loop;
end
$body$
language plpgsql; 

如何优化代码以获得更好的性能?

1 个答案:

答案 0 :(得分:0)

Postgres不喜欢在一次交易中重复进行大量更新-主要是如果相同的值被更新多次。原因是Postgres如何实现MVCC体系结构。你能做什么

a)尝试通过使用数组来减少重复更新的次数。数组只是在内存结构中-如果您拥有Postgres 10和更高版本,则数组更新很便宜。

b)尝试缩小交易规模。如果您可以将事务分解为一组较小的事务,则有可能有效清理表堆,并且可以进行更改以大大提高执行速度。

# bad technique, pseudocode
begin;
for i in 1 .. 100 loop
  for j in 1 .. 1000 loop -- any value will be updated 100x without cleaning
    update tab set v = j + i where pk = j;
  end loop;
end loop;
commit;

# better
for i in 1 .. 100 loop
  begin;
  for j in 1 .. 1000 loop -- any value will be updated 100x without cleaning
    update tab set v = j + i where pk = j;
  end loop;
  commit;
end loop;