我已经构建了一个Web UI作为ETL应用程序,允许用户选择一些CSV和TSV文件包含大量记录,我试图将它们插入到PostgreSQL数据库中。正如已经评论过的那样,这个过程有点慢。经过一些研究后,看起来使用UNNEST函数将是我的答案,但我在实现它时遇到了麻烦。老实说,我在研究Python中的任何数据处理时,通常都没有找到一个很棒的演练教程。
这是我存储它们的SQL字符串(稍后将在函数中使用):
salesorder_write = """
INSERT INTO api.salesorder (
site,
sale_type,
sales_rep,
customer_number,
shipto_number,
cust_po_number,
fob,
order_number
) VALUES (
UNNEST(ARRAY %s)
“”“
我使用这个字符串以及类似的元组列表:
for order in orders:
inputs=(
order['site'],
order['sale_type'],
order['sales_rep'],
order['customer_number'],
order['shipto_number'],
order['cust_po_number'],
order['fob'],
order['order_number']
)
tup_list.append(inputs)
cur.execute(strSQL,tup_list)
这给了我Not all arguments converted during string formatting
的错误。我的第一个问题是如何构建我的SQL以便能够传递我的元组列表。我的第二个是,我能以同样的方式使用现有的字典结构吗?
答案 0 :(得分:1)
unnest
并不优于现在(因为Psycopg 2.7)规范execute_values
:
from psycopg2.extras import execute_values
orders = [
dict (
site = 'x',
sale_type = 'y',
sales_rep = 'z',
customer_number = 1,
shipto_number = 2,
cust_po_number = 3,
fob = 4,
order_number = 5
)
]
salesorder_write = """
insert into t (
site,
sale_type,
sales_rep,
customer_number,
shipto_number,
cust_po_number,
fob,
order_number
) values %s
"""
execute_values (
cursor,
salesorder_write,
orders,
template = """(
%(site)s,
%(sale_type)s,
%(sales_rep)s,
%(customer_number)s,
%(shipto_number)s,
%(cust_po_number)s,
%(fob)s,
%(order_number)s
)""",
page_size = 1000
)