Question

我使用python进行编码，使用psql来保存我的数据。我的问题是，当我写入数据库时，大约需要2-3分钟。数据大小约为1,200,000（行）和3列。

插入功能：

def store_data(cur,table_name,data):
    cur.executemany(
    "INSERT INTO"+" "+table_name+" "+"(name, date,id) VALUES (%s, %s, %s)",
    [(data[i][0], data[i][1], data[i][2]) for i in xrange(0,len(data))]
    )

    cur.connection.commit()

如何加速功能？

Answer 1

使用COPY命令。 Postgres Documentation。另请查看COPY上的psycopg documentation。

一些数字：300万行独立INSERT：3小时。使用COPY：7秒。

Answer 2

关于"Populating a Database"

的优秀 PostgreSQL文档中有详细的章节

除了使用COPY之外，W.Mann建议如果您有进一步的性能要求，您可以做得更多：

暂时删除索引
暂时删除外键和检查约束
增加maintenance_work_mem
增加max_wal_size
禁用WAL存档和流复制
之后运行ANALYZE

如果使用pg_restore，您可以尝试在多处理器系统上使用-j选项并行运行多个作业。并查看上面链接的文档中给出的其他选项。

Answer 3

查看executemany的文档：

Warning
In its current implementation this method is not faster than 
executing execute() in a loop. For better performance you can use 
the functions described in Fast execution helpers.

在同一位置，可以找到以下链接：http://initd.org/psycopg/docs/extras.html#fast-exec他们建议：

 psycopg2.extras.execute_batch

使用PostgreSQL加速数据库中的插入表

3 个答案: