我有一个脚本,可以使用多重处理来处理文件。这是一个片段:
from multiprocessing import Pool
import os
cores=multiprocessing.cpu_count()
def f_process_file(file):
rename file
convert file
add metadata
files=[f for f in os.listdir(source_path) if f.endswith('.tif')]
p = multiprocessing.Pool(processes = cores)
async_result = p.map_async(f_process_file, files)
p.close()
p.join()
运行正常,除了在调用具有其他参数的 f_process_file 之前必须做一些其他操作。这是代码段:
def f_process_file(file, inventory, variety):
if variety > 1:
rename file with follow-up number
convert file
add metadata
else:
rename file without follow-up number
convert file
add metadata
# create list
files=[f for f in os.listdir(source_path) if f.endswith('.tif')]
# create inventory list
inventories = [fn.split('_')[2].split('-')[0].split('.')[0] for fn in files]
# Check number of files per inventory
counter=collections.Counter(inventories)
for file in files:
inventory = file.split('_')[2].split('-')[0].split('.')[0]
matching = [s for s in sorted(counter.items()) if inventory in s]
for key,variety in matching:
f_process_file(file, inventory, variety)
我无法使用多处理程序来执行此操作。你有什么建议吗?
答案 0 :(得分:1)
我找到了this question,并设法通过apply_async解决了我的问题。这是代码段:
cores=multiprocessing.cpu_count()
pool = multiprocessing.Pool(processes=procs)
for file in files:
inventory = file.split('_')[2].split('-')[0].split('.')[0]
matching = [s for s in sorted(counter.items()) if inventory in s]
for key,variety in matching:
pool.apply_async(f_process_file, (source, file, tmp, target, inventory, variety))
pool.close()
pool.join()
答案 1 :(得分:0)
这里的问题是您的工作负载不适合multiprocessing.Pool
。您正在执行嵌套迭代,因此,您可能需要增量访问多个工作负载。有两种方法可以解决您的问题。首先是先进行单线程计算,然后再使用Pool
。为此,首先构造一个对象,我将其称为ProcessingArgs
:
def class ProcessingArgs:
def __init__(self, file, inventory, variety):
self.File = file
self.Inventory = inventory
self.Variety = variety
然后,您可以修改f_process_file
以使用ProcessArgs
,也可以添加用于分解该类的包装器方法,然后调用f_process_file
。无论哪种方式,您的for循环现在都如下所示:
needs_processing = []
for file in files:
inventory = file.split('_')[2].split('-')[0].split('.')[0]
matching = [s for s in sorted(counter.items()) if inventory in s]
needs_processing.extend( [ProcessingArgs(file, inventory, variety) for key, variety in matching] )
p = multiprocessing.Pool(processes = cores)
async_result = p.map_async(f_process_file, needs_processing)
p.close()
p.join()
另一种选择是使用asyncio库:
import asyncio
await asyncio.gather(f_process_file(p for p in needs_processing))
在这种情况下,您需要在async
之前加上def f_process_file
修饰符,以便asyncio
知道它是一个异步函数。