为什么psycopg2写入需要这么长时间?

时间:2017-12-13 23:23:44

标签: python postgresql

我写了一个更新表的脚本。因为我找不到批量生产的方法。更新,我的脚本一次更新一行表。我假设对于一组100,000行,需要几秒钟来进行更新。

没有。每次写操作大约需要100毫秒。整个写操作需要(((((100,000(100)/ 1000)/ 60)/ 60)= 2.77小时。为什么写这么长时间?

以下是我使用的代码:

import psycopg2
...
entries = get_all_entries()
conn = psycopg2.connect(params)
try:
    for entry in entries:
        cursor = conn.cursor()
        cursor.execute(UPDATE_QUERY.format(entry.field1, entry.field2))
        cursor.close()
finally:
    conn.close()

我做错了什么?

2 个答案:

答案 0 :(得分:1)

你试过了吗?

cursor = conn.cursor()
for entry in entries:
     cursor.execute(UPDATE_QUERY.format(entry.field1, entry.field2))

cursor.close()

您可以使用https://docs.python.org/3/library/profile.html

对此代码进行分析

答案 1 :(得分:1)

您可以使用copy_from()方法将数据上传到服务器端临时表,而不是从客户端逐行更新表,然后通过单个SQL更新表。

以下是人为的例子:

#!/usr/bin/env python

import time, psycopg2
from random import random
from cStringIO import StringIO

CRowCount = 100000

conn = psycopg2.connect('')
conn.autocommit = False

print('Prepare playground...')
cur = conn.cursor()
cur.execute("""
    drop table if exists foo;
    create table foo(i int primary key, x float);
    insert into foo select i, 0 from generate_series(1,%s) as i;
""", (CRowCount,))
print('Done.')
cur.close();
conn.commit();

print('\nTest update row by row...')
tstart = time.time()
cur = conn.cursor()
for i in xrange(1,CRowCount+1):
    cur.execute('update foo set x = %s where i = %s', (random(), i));
conn.commit()
cur.close()
print('Done in %s s.' % (time.time() - tstart))

print('\nTest batch update...')
tstart = time.time()
cur = conn.cursor()
# Create temporary table to hold our data
cur.execute('create temp table t(i int, x float) on commit drop')
# Create and fill the buffer from which data will be uploaded
buf = StringIO()
for i in xrange(1,CRowCount+1):
    buf.write('%s\t%s\n' % (i, random()))
buf.seek(0)
# Upload data from the buffer to the temporary table
cur.copy_from(buf, 't')
# Update test table using data previously uploaded
cur.execute('update foo set x = t.x from t where foo.i = t.i')
cur.close();
conn.commit();
print('Done in %s s.' % (time.time() - tstart))

输出:

Prepare playground...
Done.

Test update row by row...
Done in 62.1189928055 s.

Test batch update...
Done in 3.95668387413 s.

正如你所看到的,第二种方式的速度提高了约20倍。