Python多处理 - logging.FileHandler对象引发PicklingError

时间:2014-07-15 13:38:45

标签: python python-2.7 multiprocessing pickle

来自logging模块和multiprocessing作业的处理程序似乎不混合:

import functools
import logging
import multiprocessing as mp

logger = logging.getLogger( 'myLogger' )
handler = logging.FileHandler( 'logFile' )

def worker( x, handler ) :
    print x ** 2

pWorker = functools.partial( worker, handler=handler )

#
if __name__ == '__main__' :
    pool = mp.Pool( processes=1 )
    pool.map( pWorker, range(3) )
    pool.close()
    pool.join()

输出:

cPickle.PicklingError: Can't pickle <type 'thread.lock'>: attribute lookup thread.lock failed

如果我将pWorker替换为以下任一方法之一,则不会引发错误

# this works
def pWorker( x ) :
    worker( x, handler )

# this works too
pWorker = functools.partial( worker, handler=open( 'logFile' ) )

我不太了解PicklingError。是因为类logging.FileHandler的对象不可选吗? (我用谷歌搜索但没有找到任何东西)

1 个答案:

答案 0 :(得分:2)

FileHandler对象在内部使用threading.Lock来同步线程之间的写入。但是,thread.lock返回的threading.Lock对象无法进行pickle,这意味着无法在进程之间发送,这需要通过pool.map将其发送给子进程。< / p>

multiprocessing文档中有一节介绍了如何使用multiprocessing进行日志记录here。基本上,您需要让子进程继承父进程的记录器,而不是通过调用map来尝试显式传递记录器或处理程序。

请注意,在Linux上,您可以这样做:

from multiprocessing import Pool
import logging

logger = logging.getLogger( 'myLogger' )


def worker(x):
    print handler
    print x **2 

def initializer(handle):
    global handler
    handler = handle

if __name__ == "__main__":
    handler = logging.FileHandler( 'logFile' )
    #pWorker = functools.partial( worker, handler=handler )
    pool = Pool(processes=4, initializer=initializer, initargs=(handler,))
    pool.map(worker, range(3))
    pool.close()
    pool.join

initializer / initargs用于在每个池的子进程启动后立即运行一次方法。在Linux上,由于os.fork的工作方式,这允许处理程序通过继承进入子代。但是,this won't work on Windows;因为它缺少对os.fork的支持,所以仍然需要挑选handler通过initargs传递它。