使用多处理锁定Python编写文件时缺少行

时间:2016-07-28 03:53:22

标签: python python-multiprocessing

这是我的代码:

from multiprocessing import Pool, Lock
from datetime import datetime as dt

console_out = "/STDOUT/Console.out"
chunksize = 50
lock = Lock()

def writer(message):
    lock.acquire()
    with open(console_out, 'a') as out:
        out.write(message)
        out.flush()
    lock.release()

def conf_wrapper(state):
    import ProcessingModule as procs
    import sqlalchemy as sal

    stcd, nrows = state
    engine = sal.create_engine('postgresql://foo:bar@localhost:5432/schema')

    writer("State {s} started  at: {n}"
           "\n".format(s=str(stcd).zfill(2), n=dt.now()))

    with engine.connect() as conn, conn.begin():
        procs.processor(conn, stcd, nrows, chunksize)

    writer("\tState {s} finished  at: {n}"
           "\n".format(s=str(stcd).zfill(2), n=dt.now()))

def main():
    nprocesses = 12
    maxproc = 1
    state_list = [(2, 113), (10, 119), (15, 84), (50, 112), (44, 110), (11, 37), (33, 197)]

    with open(console_out, 'w') as out:
        out.write("Starting at {n}\n".format(n=dt.now()))
        out.write("Using {p} processes..."
                  "\n".format(p=nprocesses))

    with Pool(processes=int(nprocesses), maxtasksperchild=maxproc) as pool:
        pool.map(func=conf_wrapper, iterable=state_list, chunksize=1)

    with open(console_out, 'a') as out:
        out.write("\nAll done at {n}".format(n=dt.now()))

文件console_out中从不包含所有7个状态。它总是错过一个或多个状态。以下是最新一次运行的输出:

Starting at 2016-07-27 21:46:58.638587
Using 12 processes...
State 44 started  at: 2016-07-27 21:47:01.482322
State 02 started  at: 2016-07-27 21:47:01.497947
State 11 started  at: 2016-07-27 21:47:01.529198
State 10 started  at: 2016-07-27 21:47:01.497947
    State 11 finished  at: 2016-07-27 21:47:15.701207
    State 15 finished  at: 2016-07-27 21:47:24.123164
    State 44 finished  at: 2016-07-27 21:47:32.029489
    State 50 finished  at: 2016-07-27 21:47:51.203107
    State 10 finished  at: 2016-07-27 21:47:53.046876
    State 33 finished  at: 2016-07-27 21:47:58.156301
    State 02 finished  at: 2016-07-27 21:48:18.856979

All done at 2016-07-27 21:48:18.992277

为什么?

注意,操作系统是Windows Server 2012 R2。

1 个答案:

答案 0 :(得分:1)

由于您在Windows上运行,因此工作进程会继承。每个流程都从头开始运行整个主程序"

特别是,使用编写的代码,每个进程都有自己的lock实例,并且这些实例彼此无关。简而言之,lock根本不提供任何进程间互斥。

要解决此问题,可以更改Pool构造函数以调用每个进程一次的初始化函数,并向其传递Lock()的实例。例如,像这样:

def init(L):
    global lock
    lock = L

然后将这些参数添加到Pool()构造函数:

initializer=init, initargs=(Lock(),),

你不再需要:

lock = Lock()

线。

然后,进程间互斥将按预期工作。

没有锁定

如果您希望将所有输​​出委派给编写程序进程,则可以跳过锁定并使用队列来代替该进程[稍后查看不同版本]。

def writer_process(q):
    with open(console_out, 'w') as out:
        while True:
            message = q.get()
            if message is None:
                break
            out.write(message)
            out.flush() # can't guess whether you really want this

并将writer()更改为:

def writer(message):
    q.put(message)

您需要再次使用initializer=构造函数中的initargs=Pool,以便所有进程都使用相同的队列。

只有一个进程应该运行writer_process(),并且可以作为multiprocessing.Process的实例自行启动。

最后,让writer_process()知道现在是时候退出,当 时间让它耗尽队列并返回时只需运行

q.put(None)

在主要过程中。

LATER

OP决定使用此版本,因为他们需要同时打开其他代码中的输出文件:

def writer_process(q):
    while True:
        message = q.get()
        if message == 'done':
            break
        else:
            with open(console_out, 'a') as out:
                out.write(message)

我不知道为什么终止哨兵改为"done"。任何独特的价值都适用于此; None是传统的。