Question

我有一个客户ID（CRM_id）的csv文件。我需要从数据库的customers表中获取主键（autoincrement int）。（我无法确定CRM_id的完整性，因此我选择不将其作为主键。

所以：

customers = []
with open("CRM_ids.csv", 'r', newline='') as csvfile:
    customerfile = csv.DictReader(csvfile, delimiter = ',', quotechar='"', skipinitialspace=True)
    #only one "CRM_id" field per row
    customers = [c for c in customerfile]

到目前为止一切顺利？我认为这是最诡异的方式（但很高兴听到其他方式）。

现在出现了丑陋的代码。它工作，但我讨厌附加到列表，因为它必须为每个循环复制和重新分配内存，对吧？有没有更好的方法（预分配+枚举来记录索引，但也许有一个更快捷/更好的方式，聪明的SQL，以便不做几千个单独的查询。 ..）？

cnx = mysql.connector.connect(user='me', password=sys.argv[1], host="localhost", database="mydb")
cursor = cnx.cursor()
select_customer = ("SELECT id FROM customers WHERE CRM_id = %(CRM_id)s LIMIT 1;")
c_ids = []
for row in customers:
    cursor.execute(select_customer, row)
    #note fetchone() returns a tuple, but the SELECTed set
    #only has a single column so we need to get this column with the [0]
    c_ids.extend(cursor.fetchall())
    c_ids = [c[0] for c in c_ids]

编辑：目的是获取列表中的主键，以便我可以使用它们从链接表中的其他CSV文件中分配一些其他数据（客户ID主键是这些其他表的外键，分配算法更改，因此它＆＃39;最好能够灵活地在python中进行分配，而不是硬编码SQL查询）。我知道这听起来有些倒退，但是＆＃34;客户＆＃34;只适用于电子表格而不是ERP / PLM，所以我必须建立＆＃34;关系＆＃34;对于这个小应用程序我自己。

Answer 1

如何更改查询以获得所需内容？

crm_ids = ",".join(customers)
select_customer = "SELECT UNIQUE id FROM customers WHERE CRM_id IN (%s);" % crm_ids

根据{{3}}，即使是多兆字节的查询，MySQL也应该没问题。如果它成为真正的长列表，你可以随时将其分解 - 保证两三个查询的速度远远快于几千个。

Answer 2

如何将csv存储在dict而不是列表中：

customers = [c for c in customerfile]

变为：

customers = {c['CRM_id']:c for c in customerfile}

然后选择整个外部参照：

result = cursor.execute('select id, CRM_id from customers')

并将新rowid添加为dict中的新条目：

for row in result:
    customers[row[1]]['newid']=row[0]

大多数python（3）esque方式从MySQL数据库重复SELECT

2 个答案: