我需要编写一个脚本,该脚本从PostgreSQL数据库中获取一堆行,为每行计算几个值,然后将结果推回数据库(现有表)中。 有一个麻烦-我需要能够以编程方式控制表名和列名。
我当前的解决方案如下:
import psycopg2
from psycopg2.extensions import AsIs, quote_ident
import psycopg2.extras
def upsert_estimated_values(con, table_name, timestamps, column_names, value_list, page_size=1000):
# con is a psycopg2 connection
# timestamps is a list of timestamps (which are the index in the DB table)
# column_names is a list of columns in the DB
# value_list is a list of tuples where each tuple
# contains a value for every column in column_names
sql = 'INSERT INTO %s (timestamp_utc, ' + ', '.join('%s' for _ in range(len(column_names))) \
+ ')\n VALUES (%s, ' + ', '.join('%s' for _ in range(len(column_names))) \
+ ')\nON CONFLICT (timestamp_utc) DO UPDATE\n SET ' \
+ ', '.join('%s = %s' for _ in range(len(column_names))) \
+ ';'
with con.cursor() as cur:
tbl = AsIs(quote_ident(table_name, cur))
cols = [AsIs(quote_ident(col, cur)) for col in column_names]
data = []
for vs, ts in zip(value_list, timestamps):
data.append((
tbl,
*cols,
ts,
*vs,
*itertools.chain(*zip(cols, vs))
))
psycopg2.extras.execute_batch(
cur, sql, page_size=page_size, argslist=data
)
con.commit()
据我了解,psycopg2
,execute_batch
函数创建一个长SQL语句以最大程度地减少与数据库的通信。但是在这种情况下发生了什么?我在argslist
的每个条目中传递表名和列名,但我不确定这是个好主意。
当前,此更新的时间比从数据库获取数据的时间大约长8倍。有提示吗?