我有62个已知文件需要处理,并使用Pool / imap来加快速度。出乎意料地,由于未知原因,imap开始多次向同一个工作进程提供相同的文件名。代码:
def worker_proc(file):
procname = current_process().name
info("{} {}".format(procname, file))
return 1 # Normally this is the count of records processed.
files = ['UOUT.DBF', 'OUT2.DBF', 'OUT3.DBF', 'OUT4.DBF', 'OUT5.DBF', 'OUT6.DBF', 'OUT7.DBF', 'OUT8.DBF', 'OUT9.DBF', 'OUTA.DBF', 'OUTB.DBF', 'OUTC.DBF', 'OUTD.DBF', 'OUTE.DBF', 'OUTF.DBF', 'OUTG.DBF', 'OUTH.DBF', 'OUTI.DBF', 'OUTJ.DBF', 'OUTK.DBF', 'OUTL.DBF', 'OUTM.DBF', 'OUTN.DBF', 'OUTO.DBF', 'OUTP.DBF', 'OUTQ.DBF', 'OUTR.DBF', 'OUTS.DBF', 'NORD.DBF', 'ORD2.DBF', ...'ORDY.DBF']
with Pool(os.cpu_count() // 2) as p:
ret = p.imap(worker_proc, files)
for n, cnt in enumerate(ret):
info("FILE {} of {} COMPLETE, cnt {}".format(n, len(files), cnt))
这是worker_proc info()生成的日志:
02:18:30 SpawnPoolWorker-8 UOUT.DBF
02:18:32 SpawnPoolWorker-12 UOUT2.DBF
02:18:33 SpawnPoolWorker-2 UOUT3.DBF
02:18:33 SpawnPoolWorker-13 UOUT4.DBF
02:18:35 SpawnPoolWorker-7 UOUT5.DBF
02:18:38 SpawnPoolWorker-3 UOUT6.DBF
02:18:42 SpawnPoolWorker-4 UOUT7.DBF
02:18:44 SpawnPoolWorker-1 UOUT8.DBF
02:18:48 SpawnPoolWorker-11 UOUT9.DBF
02:18:48 SpawnPoolWorker-6 UOUTA.DBF
02:18:53 SpawnPoolWorker-5 UOUTB.DBF
02:18:57 SpawnPoolWorker-10 UOUTC.DBF
02:18:58 SpawnPoolWorker-9 UOUTD.DBF
03:27:59 SpawnPoolWorker-12 UOUTE.DBF
03:33:07 SpawnPoolWorker-7 UOUT5.DBF
03:35:48 SpawnPoolWorker-3 UOUT6.DBF
03:42:58 SpawnPoolWorker-9 UOUTF.DBF
03:44:45 SpawnPoolWorker-13 UOUT4.DBF
03:50:04 SpawnPoolWorker-4 UOUTG.DBF
03:51:27 SpawnPoolWorker-11 UOUTH.DBF
03:54:43 SpawnPoolWorker-2 UOUT3.DBF
03:56:46 SpawnPoolWorker-6 UOUTI.DBF
03:56:47 SpawnPoolWorker-1 UOUTJ.DBF
03:57:23 SpawnPoolWorker-8 UOUT.DBF
04:00:28 SpawnPoolWorker-10 UOUTK.DBF
04:02:19 SpawnPoolWorker-5 UOUTL.DBF
05:00:56 SpawnPoolWorker-7 UOUT5.DBF
05:01:51 SpawnPoolWorker-2 UOUT3.DBF
05:03:00 SpawnPoolWorker-12 UOUTM.DBF
05:04:38 SpawnPoolWorker-13 UOUT4.DBF
05:08:12 SpawnPoolWorker-1 UOUTN.DBF
05:10:11 SpawnPoolWorker-5 UOUTO.DBF
05:10:30 SpawnPoolWorker-3 UOUTP.DBF
05:15:35 SpawnPoolWorker-10 UOUTQ.DBF
05:22:42 SpawnPoolWorker-4 UOUTR.DBF
05:24:22 SpawnPoolWorker-6 UOUTS.DBF
05:25:28 SpawnPoolWorker-11 ORD.DBF
05:27:51 SpawnPoolWorker-8 ORD2.DBF
...
你可以看到'UOUTF.DBF'池开始向相同的进程提供相同的文件名后,在我的程序的其余部分中造成了破坏。
更新:该计划已经在生产中完美地工作了好几个月。对于此特定工作负载,它每次都会失败。在Win Server 2012上全程运行约7个小时。不幸的是,当我注释掉工作程序代码时,与上面的代码一样,它可以正常工作。
上一次运行(#7)进行比较:
00:17:38 SpawnPoolWorker-5 UOUT.DBF
00:17:40 SpawnPoolWorker-2 UOUT2.DBF
00:17:42 SpawnPoolWorker-10 UOUT3.DBF
00:17:42 SpawnPoolWorker-13 UOUT4.DBF
00:17:43 SpawnPoolWorker-4 UOUT5.DBF
00:17:48 SpawnPoolWorker-11 UOUT6.DBF
00:17:53 SpawnPoolWorker-12 UOUT7.DBF
00:18:03 SpawnPoolWorker-1 UOUT8.DBF
00:18:05 SpawnPoolWorker-8 UOUT9.DBF
00:18:06 SpawnPoolWorker-7 UOUTA.DBF
00:18:09 SpawnPoolWorker-6 UOUTB.DBF
00:18:10 SpawnPoolWorker-3 UOUTC.DBF
00:18:11 SpawnPoolWorker-9 UOUTD.DBF
01:20:52 SpawnPoolWorker-8 UOUT9.DBF
01:20:53 SpawnPoolWorker-8 UOUTE.DBF
01:22:44 SpawnPoolWorker-13 UOUT4.DBF
01:22:44 SpawnPoolWorker-13 UOUTF.DBF
01:23:11 SpawnPoolWorker-5 UOUT.DBF
01:23:11 SpawnPoolWorker-5 UOUTG.DBF
...
运行#6进行比较:
00:21:04 SpawnPoolWorker-2 UOUT.DBF
00:21:04 SpawnPoolWorker-6 UOUT2.DBF
00:21:05 SpawnPoolWorker-11 UOUT3.DBF
00:21:06 SpawnPoolWorker-12 UOUT4.DBF
00:21:06 SpawnPoolWorker-4 UOUT5.DBF
00:21:06 SpawnPoolWorker-3 UOUT6.DBF
00:21:07 SpawnPoolWorker-5 UOUT7.DBF
00:21:07 SpawnPoolWorker-9 UOUT8.DBF
00:21:07 SpawnPoolWorker-10 UOUT9.DBF
00:21:07 SpawnPoolWorker-1 UOUTA.DBF
00:21:08 SpawnPoolWorker-13 UOUTB.DBF
00:21:09 SpawnPoolWorker-7 UOUTC.DBF
00:21:10 SpawnPoolWorker-8 UOUTD.DBF
01:02:27 SpawnPoolWorker-9 UOUTE.DBF
01:17:58 SpawnPoolWorker-7 UOUTC.DBF
01:17:58 SpawnPoolWorker-7 UOUTF.DBF
01:18:09 SpawnPoolWorker-12 UOUT4.DBF
01:18:09 SpawnPoolWorker-12 UOUTG.DBF
01:18:24 SpawnPoolWorker-3 UOUT6.DBF
01:18:24 SpawnPoolWorker-3 UOUTH.DBF
01:19:55 SpawnPoolWorker-1 UOUTA.DBF
01:19:55 SpawnPoolWorker-1 UOUTI.DBF
01:21:20 SpawnPoolWorker-8 UOUTD.DBF
01:21:20 SpawnPoolWorker-8 UOUTJ.DBF
01:21:57 SpawnPoolWorker-11 UOUTK.DBF
01:24:52 SpawnPoolWorker-5 UOUTL.DBF
01:25:53 SpawnPoolWorker-13 UOUTM.DBF
01:29:55 SpawnPoolWorker-4 UOUTN.DBF
01:31:13 SpawnPoolWorker-10 UOUTO.DBF
01:31:57 SpawnPoolWorker-6 UOUTP.DBF
01:32:12 SpawnPoolWorker-2 UOUTQ.DBF
...
最终更新:发现问题!
def worker_proc(file):
procname = current_process().name
info("{} {}".format(procname, file))
...
try:
engine.execute(ins)
except Exception as e:
if "deadlocked" in e.args[0] or "duplicate" in e.args[0]:
warning("Deadlocked, restarting worker_proc {}".format(file))
return worker_proc(file)
我热衷于编辑不相关的代码,但我确定了原因。如果由于重复而失败,也不应重新启动作业。似乎重复的问题是由双重提交文件引起的,而原因主要是质量差的输入文件。
还可以添加如下参数:
def worker_proc(file, attempt_no=0):
...
except Exception as e:
if attempt_no > 2:
raise
return worker_proc(file, attempt_no + 1)
这使得递归更加明确,并限制了重试次数。