Question

我在PostgreSQL 9.5数据库表中有大约1,500,000条记录，我得到一个CSV文件（通过http post请求），其中包含新的~1,500,000行，其中一些不变，一些不同，一些与原始行相比被删除。

然后我

截断旧表
循环播放CSV文件的行
将每一行插入表格

我需要的是一种方法来实现这个，而不向我的客户引入服务中断，即服务应该继续使用旧数据，直到完成所有三个步骤。目前，服务中断时间约为1小时，即读取CSV并插入所有新行所需的时间。如果需要，我可以休息5分钟。

我如何实现这种行为？

这是我的Python脚本的缩短版本：

cursor = conn.cursor(cursor_factory=DictCursor)
cursor.execute('TRUNCATE TABLE rows CASCADE')
with open(request.files.csv) as csv_file:
    for line in csv_file:
        row = parse_line(line)
        cursor.execute(
            '''INSERT INTO rows (name, bla, blu)
            VALUES (%(name)s, %(bla)s, %(blu)s)''',
            row,
        )
cursor.commit()

Answer 1

使用COPY代替with open(request.files.csv)，因为在几秒钟内将1,500,000行从CSV复制到表格
如果那些秒（假设一分钟）太长，只使用事务将无济于事，导致truncate requires lock在表上，而不是行

TRUNCATE在其运行的每个表上获取一个ACCESS EXCLUSIVE锁上

因此，如果你可以重建表上的所有依赖对象，最好的可能是：

create t_table as select * from "rows" where false;
copy t_table from request.files.csv;
--build all needed dependant objects (indexes, constraints,triggers);
begin;
  alter table "rows" rename to "some_name";
  alter table "t_table " rename to "rows";
end;
--here is a miliseconds glitch to swith for users (if you use memcache or so - need to refresh it)
drop table "some_name";

<强>更新 to copy columns from csv to several table columns list columns：

COPY table_name [（column_name [，...]）]

截断行并插入新行而不引入服务中断？

1 个答案: