使用自联接更新优化表从csv到pg的迁移

时间:2016-11-23 12:33:34

标签: postgresql csv postgresql-9.1 postgresql-performance

我有一个大小为1.5GB的CSV,包含大约11000000条记录,我将这些数据填充到Postgres表中

到目前为止我尝试的是:

set local work_mem = '4000MB';

copy truck_dispatch_logs(
    old_id,ms_truck_id,issue_date, ms_site_id, ms_product_id, act_dispatch, schedule_time, draw, truck_no, adjusted_time, default_schedule_time, log_date_time,location
) 
FROM '/home/truckdispatchlog.csv' 
DELIMITER ',' CSV HEADER;

查询成功返回:11696539行受影响,01:40:3635小时执行时间。

CREATE INDEX truck_dispatch_logs_ms_truck_id ON truck_dispatch_logs(ms_truck_id);
CREATE INDEX truck_dispatch_logs_truck_id ON truck_dispatch_logs(truck_id);

UPDATE "truck_dispatch_logs" as td
SET new_truck_id = temp.id 
from trucks as temp
where td.ms_truck_id = temp.old_id;

这个查询花了3天以上,所以我停止了。

trucks表中有一个类型为integer的old_id和类型为UUID的id

这是表的结构:

“truck_dispatch_logs”

"ms_truck_id";"integer"
"truck_id";"uuid"
"schedule_time";"timestamp without time zone"
--some more irrelevant columns

“货车”

"id";"uuid"
"old_id";"integer"
--some more irrelevant columns

我可以做些什么来改善整个过程?

0 个答案:

没有答案