Question

我是编程的新手，所以如果以下程序中的逻辑没有意义，那可能就是原因。幸运的是，以下代码运行并执行我需要的所有操作，但感觉执行需要很长时间（每10,000条记录需要6分钟）。

该程序的目的是为我的数据库中的记录分配新的ID，并允许用户指定增量值和这些ID的起始点。

说实话，我不完全确定执行时间是否不合理，因为我没有很多经验可以根据它，但如果有办法加速，我都是耳朵。

# generates study IDs for MS Access dataset

import pyodbc
import random
import time

startTime = time.time()

dbFile = 'C:\Backend.accdb'
conn = pyodbc.connect(r'DRIVER={Microsoft Access Driver (*.mdb, *.accdb)};'
                       + 'DBQ=' + dbFile + '; Provider=MSDASQL;')
cursor = conn.cursor()


# shuffle the existing IDs so the assignment of the new IDs is random
a = []
sql = "SELECT ID FROM Clients"

for row in cursor.execute(sql):
    a.append(row.ID)

print "\nIDs appended to list...\n"

random.shuffle(a)

print "\nlist shuffled\n"

# assign new IDs according to the conditions below
startPt = 900001
increment = 7
idList = {}

for i in a:
    idList[i] = startPt
    startPt += increment

# append new IDs to another table in the database
for j, k in idList.iteritems():
    sql = "INSERT INTO newID values ('%s', '%s')" %(j,k)
    cursor.execute(sql)
    conn.commit()

# close connection
cursor.close()
conn.close()

# calculate, in seconds, the time the program took to execute    
executionTime = str(time.time() - startTime)

print "completed. the program took %s seconds to execute." %executionTime

Answer 1

# shuffle the existing IDs so the assignment of the new IDs is random
a = []
sql = "SELECT ID FROM Clients"

for row in cursor.execute(sql):
    a.append(row.ID)

如果您想将所有内容放入列表中，请使用cursor.fetchall()，它会为您创建列表

print "\nIDs appended to list...\n"

random.shuffle(a)

print "\nlist shuffled\n"

您应该能够修改您的查询，以便为您SELECT ID FROM Clients ORDER BY RAND()或类似的人随机播放。这样你就不必自己洗牌了，而且可能会更快。

for i in a:
    idList[i] = startPt
    startPt += increment

为什么要将数据存储在字典中只是为了直接操作？

# append new IDs to another table in the database
for j, k in idList.iteritems():
    sql = "INSERT INTO newID values ('%s', '%s')" %(j,k)
    cursor.execute(sql)

你应该总是使用参数，而不是字符串格式

 cursor.execute("INSERT INTO newID values(?,?)", (j, k))

这样可以防止SQL注入。您还可以使用executemany函数。它允许您传递不同参数的列表，并将对其中许多参数执行相同的查询。这可能是处理数据的最快方式。

    conn.commit()

每次插入后都不应该提交。通常你等到完成后才会完成。

Answer 2

您将所有ID一次插入数据库中。您可以使用大查询一次性插入所有内容：

"INSERT INTO newID values (123, 123), (456, 456), (789, 789)" (and so on)

这意味着您需要先构建查询字符串然后再执行它。如果此后代码仍然很慢，您应该使用Python代码分析器来查看哪个部分是瓶颈。

Answer 3

我建议您打印出每个代码块需要多长时间。

我认为最慢的部分将插入到newID中，特别是如果桌面上有主键。

我建议您对插入使用“全部执行”，以便它一次完成插入。

事实上，pyodbc看起来有这个功能：

<强> executemany

cursor.executemany(sql, seq_of_parameters) --> None

Executes the same SQL statement for each set of parameters. seq_of_parameters is a sequence of sequences.

params = [ ('A', 1), ('B', 2) ]
executemany("insert into t(name, id) values (?, ?)", params)
This will execute the SQL statement twice, once with ('A', 1) and once with ('B', 2).

请参阅http://code.google.com/p/pyodbc/wiki/Cursor

有没有办法加快这个python程序？（短）

3 个答案:

有没有办法加快这个python程序？ （短）

3 个答案:

有没有办法加快这个python程序？（短）