我有一个并行程序。它顺序读取文件,每个文件中的任务在各个进程中拆分。在所有进程上完成文件后,将加载下一个文件,依此类推。我想写日志文件,以便每个数据文件都有一个新的日志文件。我希望我所有的过程都写日志信息,并且它们不会互相干扰。阅读了一些帖子和logging documentation之后,我想出了以下最小示例
import numpy as np
import matplotlib.pyplot as plt
from time import time
import multiprocessing, pathos
import logging
def task(x):
thisID = pathos.core.getpid()
logger.info("Process " + str(thisID) + ": Processing stuff " + x)
return 1
for iJob in range(3):
# Create file handler
fh = logging.FileHandler('log'+str(iJob)+'_pathos.txt')
fh.setLevel(logging.DEBUG)
formatter = logging.Formatter('%(asctime)s %(name)-12s %(levelname)-8s %(message)s')
fh.setFormatter(formatter)
logger = pathos.logger(level=logging.DEBUG, handler=fh)
pool = pathos.multiprocessing.ProcessingPool(7)
results_mp = pool.map(task, list("aalkfnalkgnlkaerngnarngkwlekfwebkwr"))
logger.removeHandler(fh)
print(results_mp)
无论我尝试什么,所有输出都将输出到第一个日志文件,另外两个则被创建但保持为空。使用裸multiprocessing
的替代实现似乎工作正常(请参见下文)。问题在于我需要麻烦,因为它允许我并行化一些导入的库,而常规的多处理程序拒绝使用
def task(x):
thisID = multiprocessing.current_process()._identity[0]
logger.info("Process " + str(thisID) + ": Processing stuff " + x)
return 1
for iJob in range(3):
# Create file handler
fh = logging.FileHandler('log'+str(iJob)+'_pathos.txt')
fh.setLevel(logging.DEBUG)
formatter = logging.Formatter('%(asctime)s %(name)-12s %(levelname)-8s %(message)s')
fh.setFormatter(formatter)
logger = logging.getLogger("MyLogger")
logger.setLevel(logging.DEBUG)
logger.addHandler(fh)
pool = multiprocessing.Pool(7)
results_mp = pool.map(task, list("aalkfnalkgnlkaerngnarngkwlekfwebkwr"))
logger.removeHandler(fh)
print(results_mp)
也许值得一提的是,我从Jupyter笔记本运行代码。另外,当我两次运行相同的单元格时,会删除一些中间的日志文件,因此会出现一些不稳定的行为。有时新的日志文件都为空
答案 0 :(得分:0)
ProcessPool创建have their own memory的新工作线程。因此,您不能/不应该访问全局变量。将您需要的所有内容传递到pool.map()
。
这对我有用:
import numpy as np
import matplotlib.pyplot as plt
from time import time
import multiprocessing, pathos
import logging
def task(x, iJob):
thisID = pathos.core.getpid()
fh = logging.FileHandler('log'+str(iJob)+'_pathos.txt')
fh.setLevel(logging.DEBUG)
formatter = logging.Formatter('%(asctime)s %(name)-12s %(levelname)-8s %(message)s')
fh.setFormatter(formatter)
logger = pathos.logger(level=logging.DEBUG, handler=fh)
logger.info("Process " + str(thisID) + ": Processing stuff " + x)
logger.removeHandler(fh)
return 1
for iJob in range(3):
# Create file handler
pool = pathos.multiprocessing.ProcessPool(7)
input = "aalkfnalkgnlkaerngnarngkwlekfwebkwr"
results_mp = pool.map(task, list(input), [iJob] * len(input))
print(results_mp)