我写了一个更新表的脚本。因为我找不到批量生产的方法。更新,我的脚本一次更新一行表。我假设对于一组100,000行,需要几秒钟来进行更新。
没有。每次写操作大约需要100毫秒。整个写操作需要(((((100,000(100)/ 1000)/ 60)/ 60)= 2.77小时。为什么写这么长时间?
以下是我使用的代码:
import psycopg2
...
entries = get_all_entries()
conn = psycopg2.connect(params)
try:
for entry in entries:
cursor = conn.cursor()
cursor.execute(UPDATE_QUERY.format(entry.field1, entry.field2))
cursor.close()
finally:
conn.close()
我做错了什么?
答案 0 :(得分:1)
你试过了吗?
cursor = conn.cursor()
for entry in entries:
cursor.execute(UPDATE_QUERY.format(entry.field1, entry.field2))
cursor.close()
对此代码进行分析
答案 1 :(得分:1)
您可以使用copy_from()
方法将数据上传到服务器端临时表,而不是从客户端逐行更新表,然后通过单个SQL更新表。
以下是人为的例子:
#!/usr/bin/env python
import time, psycopg2
from random import random
from cStringIO import StringIO
CRowCount = 100000
conn = psycopg2.connect('')
conn.autocommit = False
print('Prepare playground...')
cur = conn.cursor()
cur.execute("""
drop table if exists foo;
create table foo(i int primary key, x float);
insert into foo select i, 0 from generate_series(1,%s) as i;
""", (CRowCount,))
print('Done.')
cur.close();
conn.commit();
print('\nTest update row by row...')
tstart = time.time()
cur = conn.cursor()
for i in xrange(1,CRowCount+1):
cur.execute('update foo set x = %s where i = %s', (random(), i));
conn.commit()
cur.close()
print('Done in %s s.' % (time.time() - tstart))
print('\nTest batch update...')
tstart = time.time()
cur = conn.cursor()
# Create temporary table to hold our data
cur.execute('create temp table t(i int, x float) on commit drop')
# Create and fill the buffer from which data will be uploaded
buf = StringIO()
for i in xrange(1,CRowCount+1):
buf.write('%s\t%s\n' % (i, random()))
buf.seek(0)
# Upload data from the buffer to the temporary table
cur.copy_from(buf, 't')
# Update test table using data previously uploaded
cur.execute('update foo set x = t.x from t where foo.i = t.i')
cur.close();
conn.commit();
print('Done in %s s.' % (time.time() - tstart))
输出:
Prepare playground... Done. Test update row by row... Done in 62.1189928055 s. Test batch update... Done in 3.95668387413 s.
正如你所看到的,第二种方式的速度提高了约20倍。