Question

我有架构，有些数据被复制到临时表，然后在命令上，这个有条件的数据必须被复制到其他一个表中，在此运行计数，删除和更新基于object_id之前相同的所有表格。最长的操作是复制 - 需要10分钟！在300 000行。 insert into t1 (t1_f1, t1_f2, name, value) SELECT DISTINCT ON (object_id) t1_f1, t1_f2, name, value where loading_process_id = 695 - 例如。

我可以加快这个过程吗？或者这是糟糕的架构，我必须改变它？

更多 - 堆表可以包含非常多的数据，复制可以是几百万行。在堆和其他表中索引的某些字段（用于计数或过滤）。

这是计划不是那么大的数据

    Insert on main_like  (cost=2993.63..3115.51 rows=6094 width=797) (actual time=6143.194..6143.194 rows=0 loops=1) 
  ->  Subquery Scan on "*SELECT*"  (cost=2993.63..3115.51 rows=6094 width=797) (actual time=55.995..125.081 rows=6094 loops=1)
        ->  Unique  (cost=2993.63..3024.10 rows=6094 width=796) (actual time=55.909..79.237 rows=6094 loops=1)
              ->  Sort  (cost=2993.63..3008.86 rows=6094 width=796) (actual time=55.904..69.195 rows=6094 loops=1)
                    Sort Key: main_loadingprocessobjects.object_id
                    Sort Method: quicksort  Memory: 3321kB
                    ->  Seq Scan on main_loadingprocessobjects  (cost=0.00..465.02 rows=6094 width=796) (actual time=0.578..8.285 rows=6094 loops=1)
                          Filter: (loading_process_id = 695)
                          Rows Removed by Filter: 1428
Planning time: 0.394 ms
Execution time: 6143.631 ms

无插入说明 -

Unique  (cost=2993.63..3024.10 rows=6094 width=796) (actual time=48.915..52.902 rows=6094 loops=1)
  ->  Sort  (cost=2993.63..3008.86 rows=6094 width=796) (actual time=48.911..49.959 rows=6094 loops=1)
        Sort Key: object_id
        Sort Method: quicksort  Memory: 3321kB
        ->  Seq Scan on main_loadingprocessobjects  (cost=0.00..465.02 rows=6094 width=796) (actual time=0.401..5.516 rows=6094 loops=1)
              Filter: (loading_process_id = 695)
              Rows Removed by Filter: 1428
Planning time: 0.214 ms
Execution time: 53.694 ms

main_loadingprocessobjects - 是堆 main_like - 是t1

Answer 1

有几点你可能会担心这个问题：

COPY 语句比插入select语句更快。
在以下查询ex :( type，category）上创建复合索引。

SELECT DISTINCT ON（object_id）t1_f1，t1_f2，name，value where 类型=＆＃39; TI＆＃39;和category =＆＃39;添加＆＃39;

GROUP BY 语句比 DISTINCT 语句更快。
如果您考虑在临时表上使用率很高，请在postgresql.conf上增加 temp_buffer 。
尝试使用 CTE （公用表格式）来代替临时表。

希望这一点对我未来的发展有所帮助。

PostgreSQL从表到表复制大数据

1 个答案: