multiprocessing.Pool imap将重复任务提供给工作进程

时间:2017-10-18 15:13:06

标签: python multiprocessing

我有62个已知文件需要处理,并使用Pool / imap来加快速度。出乎意料地,由于未知原因,imap开始多次向同一个工作进程提供相同的文件名。代码:

def worker_proc(file):
    procname = current_process().name
    info("{} {}".format(procname, file))
    return 1  # Normally this is the count of records processed.

files = ['UOUT.DBF', 'OUT2.DBF', 'OUT3.DBF', 'OUT4.DBF', 'OUT5.DBF', 'OUT6.DBF', 'OUT7.DBF', 'OUT8.DBF', 'OUT9.DBF', 'OUTA.DBF', 'OUTB.DBF', 'OUTC.DBF', 'OUTD.DBF', 'OUTE.DBF', 'OUTF.DBF', 'OUTG.DBF', 'OUTH.DBF', 'OUTI.DBF', 'OUTJ.DBF', 'OUTK.DBF', 'OUTL.DBF', 'OUTM.DBF', 'OUTN.DBF', 'OUTO.DBF', 'OUTP.DBF', 'OUTQ.DBF', 'OUTR.DBF', 'OUTS.DBF', 'NORD.DBF', 'ORD2.DBF', ...'ORDY.DBF']
with Pool(os.cpu_count() // 2) as p:
    ret = p.imap(worker_proc, files)
    for n, cnt in enumerate(ret):
        info("FILE {} of {} COMPLETE, cnt {}".format(n, len(files), cnt))

这是worker_proc info()生成的日志:

02:18:30 SpawnPoolWorker-8  UOUT.DBF
02:18:32 SpawnPoolWorker-12 UOUT2.DBF
02:18:33 SpawnPoolWorker-2  UOUT3.DBF
02:18:33 SpawnPoolWorker-13 UOUT4.DBF
02:18:35 SpawnPoolWorker-7  UOUT5.DBF
02:18:38 SpawnPoolWorker-3  UOUT6.DBF
02:18:42 SpawnPoolWorker-4  UOUT7.DBF
02:18:44 SpawnPoolWorker-1  UOUT8.DBF
02:18:48 SpawnPoolWorker-11 UOUT9.DBF
02:18:48 SpawnPoolWorker-6  UOUTA.DBF
02:18:53 SpawnPoolWorker-5  UOUTB.DBF
02:18:57 SpawnPoolWorker-10 UOUTC.DBF
02:18:58 SpawnPoolWorker-9  UOUTD.DBF
03:27:59 SpawnPoolWorker-12 UOUTE.DBF
03:33:07 SpawnPoolWorker-7  UOUT5.DBF
03:35:48 SpawnPoolWorker-3  UOUT6.DBF
03:42:58 SpawnPoolWorker-9  UOUTF.DBF
03:44:45 SpawnPoolWorker-13 UOUT4.DBF
03:50:04 SpawnPoolWorker-4  UOUTG.DBF
03:51:27 SpawnPoolWorker-11 UOUTH.DBF
03:54:43 SpawnPoolWorker-2  UOUT3.DBF
03:56:46 SpawnPoolWorker-6  UOUTI.DBF
03:56:47 SpawnPoolWorker-1  UOUTJ.DBF
03:57:23 SpawnPoolWorker-8  UOUT.DBF
04:00:28 SpawnPoolWorker-10 UOUTK.DBF
04:02:19 SpawnPoolWorker-5  UOUTL.DBF
05:00:56 SpawnPoolWorker-7  UOUT5.DBF
05:01:51 SpawnPoolWorker-2  UOUT3.DBF
05:03:00 SpawnPoolWorker-12 UOUTM.DBF
05:04:38 SpawnPoolWorker-13 UOUT4.DBF
05:08:12 SpawnPoolWorker-1  UOUTN.DBF
05:10:11 SpawnPoolWorker-5  UOUTO.DBF
05:10:30 SpawnPoolWorker-3  UOUTP.DBF
05:15:35 SpawnPoolWorker-10 UOUTQ.DBF
05:22:42 SpawnPoolWorker-4  UOUTR.DBF
05:24:22 SpawnPoolWorker-6  UOUTS.DBF
05:25:28 SpawnPoolWorker-11 ORD.DBF
05:27:51 SpawnPoolWorker-8  ORD2.DBF
...

你可以看到'UOUTF.DBF'池开始向相同的进程提供相同的文件名后,在我的程序的其余部分中造成了破坏。

更新:该计划已经在生产中完美地工作了好几个月。对于此特定工作负载,它每次都会失败。在Win Server 2012上全程运行约7个小时。不幸的是,当我注释掉工作程序代码时,与上面的代码一样,它可以正常工作。

上一次运行(#7)进行比较:

00:17:38 SpawnPoolWorker-5  UOUT.DBF
00:17:40 SpawnPoolWorker-2  UOUT2.DBF
00:17:42 SpawnPoolWorker-10 UOUT3.DBF
00:17:42 SpawnPoolWorker-13 UOUT4.DBF
00:17:43 SpawnPoolWorker-4  UOUT5.DBF
00:17:48 SpawnPoolWorker-11 UOUT6.DBF
00:17:53 SpawnPoolWorker-12 UOUT7.DBF
00:18:03 SpawnPoolWorker-1  UOUT8.DBF
00:18:05 SpawnPoolWorker-8  UOUT9.DBF
00:18:06 SpawnPoolWorker-7  UOUTA.DBF
00:18:09 SpawnPoolWorker-6  UOUTB.DBF
00:18:10 SpawnPoolWorker-3  UOUTC.DBF
00:18:11 SpawnPoolWorker-9  UOUTD.DBF
01:20:52 SpawnPoolWorker-8  UOUT9.DBF
01:20:53 SpawnPoolWorker-8  UOUTE.DBF
01:22:44 SpawnPoolWorker-13 UOUT4.DBF
01:22:44 SpawnPoolWorker-13 UOUTF.DBF
01:23:11 SpawnPoolWorker-5  UOUT.DBF
01:23:11 SpawnPoolWorker-5  UOUTG.DBF
...

运行#6进行比较:

00:21:04 SpawnPoolWorker-2  UOUT.DBF
00:21:04 SpawnPoolWorker-6  UOUT2.DBF
00:21:05 SpawnPoolWorker-11 UOUT3.DBF
00:21:06 SpawnPoolWorker-12 UOUT4.DBF
00:21:06 SpawnPoolWorker-4  UOUT5.DBF
00:21:06 SpawnPoolWorker-3  UOUT6.DBF
00:21:07 SpawnPoolWorker-5  UOUT7.DBF
00:21:07 SpawnPoolWorker-9  UOUT8.DBF
00:21:07 SpawnPoolWorker-10 UOUT9.DBF
00:21:07 SpawnPoolWorker-1  UOUTA.DBF
00:21:08 SpawnPoolWorker-13 UOUTB.DBF
00:21:09 SpawnPoolWorker-7  UOUTC.DBF
00:21:10 SpawnPoolWorker-8  UOUTD.DBF
01:02:27 SpawnPoolWorker-9  UOUTE.DBF
01:17:58 SpawnPoolWorker-7  UOUTC.DBF
01:17:58 SpawnPoolWorker-7  UOUTF.DBF
01:18:09 SpawnPoolWorker-12 UOUT4.DBF
01:18:09 SpawnPoolWorker-12 UOUTG.DBF
01:18:24 SpawnPoolWorker-3  UOUT6.DBF
01:18:24 SpawnPoolWorker-3  UOUTH.DBF
01:19:55 SpawnPoolWorker-1  UOUTA.DBF
01:19:55 SpawnPoolWorker-1  UOUTI.DBF
01:21:20 SpawnPoolWorker-8  UOUTD.DBF
01:21:20 SpawnPoolWorker-8  UOUTJ.DBF
01:21:57 SpawnPoolWorker-11 UOUTK.DBF
01:24:52 SpawnPoolWorker-5  UOUTL.DBF
01:25:53 SpawnPoolWorker-13 UOUTM.DBF
01:29:55 SpawnPoolWorker-4  UOUTN.DBF
01:31:13 SpawnPoolWorker-10 UOUTO.DBF
01:31:57 SpawnPoolWorker-6  UOUTP.DBF
01:32:12 SpawnPoolWorker-2  UOUTQ.DBF
...

最终更新:发现问题!

def worker_proc(file):
    procname = current_process().name
    info("{} {}".format(procname, file))
    ...
    try:
        engine.execute(ins)
    except Exception as e:
        if "deadlocked" in e.args[0] or "duplicate" in e.args[0]:
            warning("Deadlocked, restarting worker_proc {}".format(file))
            return worker_proc(file)

我热衷于编辑不相关的代码,但我确定了原因。如果由于重复而失败,也不应重新启动作业。似乎重复的问题是由双重提交文件引起的,而原因主要是质量差的输入文件。

还可以添加如下参数:

def worker_proc(file, attempt_no=0):
    ...
    except Exception as e:
        if attempt_no > 2:
            raise
        return worker_proc(file, attempt_no + 1)

这使得递归更加明确,并限制了重试次数。

0 个答案:

没有答案