python执行多个命令进行多处理

时间:2018-11-28 14:49:48

标签: python multiprocessing

我有一个脚本,可以使用多重处理来处理文件。这是一个片段:

from multiprocessing import Pool
import os
cores=multiprocessing.cpu_count()

def f_process_file(file):
    rename file  
    convert file
    add metadata

files=[f for f in os.listdir(source_path) if f.endswith('.tif')]
p =  multiprocessing.Pool(processes = cores)
async_result = p.map_async(f_process_file, files)
p.close()
p.join()

运行正常,除了在调用具有其他参数的 f_process_file 之前必须做一些其他操作。这是代码段:

def f_process_file(file, inventory, variety):
    if variety > 1:
        rename file with follow-up number 
        convert file
        add metadata
    else: 
        rename file without follow-up number 
        convert file
        add metadata

# create list 
files=[f for f in os.listdir(source_path) if f.endswith('.tif')]
# create inventory list
inventories = [fn.split('_')[2].split('-')[0].split('.')[0] for fn in files]
# Check number of files per inventory 
counter=collections.Counter(inventories)

for file in files:
    inventory = file.split('_')[2].split('-')[0].split('.')[0]
    matching = [s for s in sorted(counter.items()) if inventory in s]
    for key,variety in matching:  
        f_process_file(file, inventory, variety)

我无法使用多处理程序来执行此操作。你有什么建议吗?

2 个答案:

答案 0 :(得分:1)

我找到了this question,并设法通过apply_async解决了我的问题。这是代码段:

cores=multiprocessing.cpu_count()
pool = multiprocessing.Pool(processes=procs)
for file in files:
  inventory = file.split('_')[2].split('-')[0].split('.')[0]
  matching = [s for s in sorted(counter.items()) if inventory in s]
  for key,variety in matching: 
    pool.apply_async(f_process_file, (source, file, tmp, target, inventory, variety))
pool.close()
pool.join()

答案 1 :(得分:0)

这里的问题是您的工作负载不适合multiprocessing.Pool。您正在执行嵌套迭代,因此,您可能需要增量访问多个工作负载。有两种方法可以解决您的问题。首先是先进行单线程计算,然后再使用Pool。为此,首先构造一个对象,我将其称为ProcessingArgs

def class ProcessingArgs:

    def __init__(self, file, inventory, variety):
        self.File = file
        self.Inventory = inventory
        self.Variety = variety

然后,您可以修改f_process_file以使用ProcessArgs,也可以添加用于分解该类的包装器方法,然后调用f_process_file。无论哪种方式,您的for循环现在都如下所示:

needs_processing = []
for file in files:
    inventory = file.split('_')[2].split('-')[0].split('.')[0]
    matching = [s for s in sorted(counter.items()) if inventory in s]
    needs_processing.extend( [ProcessingArgs(file, inventory, variety) for key, variety in matching] )

p = multiprocessing.Pool(processes = cores)
async_result = p.map_async(f_process_file, needs_processing)
p.close()
p.join()

另一种选择是使用asyncio库:

import asyncio

await asyncio.gather(f_process_file(p for p in needs_processing))

在这种情况下,您需要在async之前加上def f_process_file修饰符,以便asyncio知道它是一个异步函数。