如何正确使用Pathos中的多处理进行日志记录

时间:2019-09-17 16:33:41

标签: python logging multiprocessing pathos

我有一个并行程序。它顺序读取文件,每个文件中的任务在各个进程中拆分。在所有进程上完成文件后,将加载下一个文件,依此类推。我想写日志文件,以便每个数据文件都有一个新的日志文件。我希望我所有的过程都写日志信息,并且它们不会互相干扰。阅读了一些帖子和logging documentation之后,我想出了以下最小示例

import numpy as np
import matplotlib.pyplot as plt
from time import time
import multiprocessing, pathos
import logging

def task(x):
    thisID = pathos.core.getpid()
    logger.info("Process " + str(thisID) + ": Processing stuff " + x)
    return 1

for iJob in range(3):
    # Create file handler
    fh = logging.FileHandler('log'+str(iJob)+'_pathos.txt')
    fh.setLevel(logging.DEBUG)
    formatter = logging.Formatter('%(asctime)s %(name)-12s %(levelname)-8s %(message)s')
    fh.setFormatter(formatter)

    logger = pathos.logger(level=logging.DEBUG, handler=fh)

    pool = pathos.multiprocessing.ProcessingPool(7)
    results_mp = pool.map(task, list("aalkfnalkgnlkaerngnarngkwlekfwebkwr"))

    logger.removeHandler(fh)

    print(results_mp)

无论我尝试什么,所有输出都将输出到第一个日志文件,另外两个则被创建但保持为空。使用裸multiprocessing的替代实现似乎工作正常(请参见下文)。问题在于我需要麻烦,因为它允许我并行化一些导入的库,而常规的多处理程序拒绝使用

def task(x):
    thisID = multiprocessing.current_process()._identity[0]
    logger.info("Process " + str(thisID) + ": Processing stuff " + x)
    return 1

for iJob in range(3):
    # Create file handler
    fh = logging.FileHandler('log'+str(iJob)+'_pathos.txt')
    fh.setLevel(logging.DEBUG)
    formatter = logging.Formatter('%(asctime)s %(name)-12s %(levelname)-8s %(message)s')
    fh.setFormatter(formatter)

    logger = logging.getLogger("MyLogger")
    logger.setLevel(logging.DEBUG)
    logger.addHandler(fh)

    pool = multiprocessing.Pool(7)
    results_mp = pool.map(task, list("aalkfnalkgnlkaerngnarngkwlekfwebkwr"))

    logger.removeHandler(fh)

    print(results_mp)

也许值得一提的是,我从Jupyter笔记本运行代码。另外,当我两次运行相同的单元格时,会删除一些中间的日志文件,因此会出现一些不稳定的行为。有时新的日志文件都为空

1 个答案:

答案 0 :(得分:0)

ProcessPool创建have their own memory的新工作线程。因此,您不能/不应该访问全局变量。将您需要的所有内容传递到pool.map()

这对我有用:

import numpy as np
import matplotlib.pyplot as plt
from time import time
import multiprocessing, pathos
import logging

def task(x, iJob):
    thisID = pathos.core.getpid()
    fh = logging.FileHandler('log'+str(iJob)+'_pathos.txt')
    fh.setLevel(logging.DEBUG)
    formatter = logging.Formatter('%(asctime)s %(name)-12s %(levelname)-8s %(message)s')
    fh.setFormatter(formatter)

    logger = pathos.logger(level=logging.DEBUG, handler=fh)
    logger.info("Process " + str(thisID) + ": Processing stuff " + x)
    logger.removeHandler(fh)
    return 1

for iJob in range(3):
    # Create file handler

    pool = pathos.multiprocessing.ProcessPool(7)
    input = "aalkfnalkgnlkaerngnarngkwlekfwebkwr"
    results_mp = pool.map(task, list(input), [iJob] * len(input))
    print(results_mp)
相关问题