我正在尝试打开包含数据的pickle文件,然后使用该数据更新MSSQL表。这需要10天才能更新1,000,000行。所以我写了一个更平行的脚本。运行它的过程越多,我得到的错误越多
(<class 'pyodbc.Error'>, Error('40001', '[40001] [Microsoft][ODBC SQL Server Dri
ver][SQL Server]Transaction (Process ID 93) was deadlocked on lock resources wit
h another process and has been chosen as the deadlock victim. Rerun the transact
ion. (1205) (SQLExecDirectW)'), <traceback object at 0x0000000002791808>)
正如你在我的代码中看到的那样,我一直试图处理更新,直到成功,甚至在这里睡一秒钟
while True:
try:
updated = cursor.execute(update,'Yes', fileName+'.'+ext, dt, size,uniqueID )
break
except:
time.sleep(1)
print sys.exc_info()
这是因为当你在windows中使用多处理模块时,它使用的是os.spawn而不是os.fork吗?
有没有办法可以提高速度?
有人告诉我,这个表可以处理更多的交易......
#!C:/Python/python.exe -u
import pyodbc,re,pickle,os,glob,sys,time
from multiprocessing import Lock, Process, Queue, current_process
def UpDater(pickleQueue):
for pi in iter(pickleQueue.get, 'STOP'):
name = current_process().name
f=pi
cnxn = pyodbc.connect('DRIVER={SQL Server};SERVER=database.windows.net;DATABASE=DB;UID=user;PWD=pwd');
cursor = cnxn.cursor()
update = ("""UPDATE DocumentList
SET Downloaded=?, DownLoadedAs=?,DownLoadedWhen=?,DownLoadedSizeKB=?
WHERE DocNumberSequence=?""")
r = re.compile('\d+')
pkl_file = open(pi, 'rb')
meta = pickle.load(pkl_file)
fileName = meta[0][0]
pl = r.findall(fileName)
l= int(len(pl)-1)
ext = meta[0][1]
url = meta[0][2]
uniqueID = pl[l]
dt = meta[0][4]
size = meta[0][5]
while True:
try:
updated = cursor.execute(update,'Yes', fileName+'.'+ext, dt, size,uniqueID )
break
except:
time.sleep(1)
print sys.exc_info()
print uniqueID
cnxn.commit()
pkl_file.close()
os.remove(fileName+'.pkl')
cnxn.close()
if __name__ == '__main__':
os.chdir('Pickles')
pickles = glob.glob("*.pkl")
pickleQueue=Queue();processes =[];
for item in pickles:
pickleQueue.put(item)
workers = int(sys.argv[1]);
for x in xrange(workers):
p = Process(target=UpDater,args=(pickleQueue,))
p.start()
processes.append(p)
pickleQueue.put('STOP')
for p in processes:
p.join()
我正在使用Windows 7和python 2.7 Anaconda Distribution
修改的 下面使用行锁的答案阻止了错误的发生。但是,更新仍然很慢。在100倍加速时需要主键上的旧时尚指数
答案 0 :(得分:2)
要尝试的一些事情。使用睡眠是一个坏主意。首先,您可以尝试行级锁定吗?
table {
width: 400px;
}
table td {
border: 1px solid #000;
}
.static-data {
height: 80px;
}
另一种选择是在事务中包装每个:
update = ("""UPDATE DocumentList WITH (ROWLOCK)
SET Downloaded=?, DownLoadedAs=?,DownLoadedWhen=?,DownLoadedSizeKB=?
WHERE DocNumberSequence=? """)
这些解决方案中的任何一个都适合您吗?