python,subprocess:当一个(在一个组中)终止时启动新进程

时间:2014-12-19 13:22:13

标签: python subprocess

我有 n 个文件,可以使用相同的Python脚本analysis.py分别独立地进行分析。在包装器脚本wrapper.py中,我循环遍历这些文件,并使用analysis.py作为单独的进程调用subprocess.Popen

for a_file in all_files:
    command = "python analysis.py %s" % a_file
    analysis_process = subprocess.Popen(
                                            shlex.split(command),
                                            stdout=subprocess.PIPE,
                                            stderr=subprocess.PIPE)
    analysis_process.wait()

现在,我想使用我机器的所有 k CPU内核,以加快整个分析。 只要有要分析的文件,有没有办法让k-1运行进程?

1 个答案:

答案 0 :(得分:3)

这概述了如何使用完全存在于这些任务中的multiprocessing.Pool

from multiprocessing import Pool, cpu_count

# ...
all_files = ["file%d" % i for i in range(5)]


def process_file(file_name):
    # process file
    return "finished file %s" % file_name

pool = Pool(cpu_count())

# this is a blocking call - when it's done, all files have been processed
results = pool.map(process_file, all_files)

# no more tasks can go in the pool
pool.close()

# wait for all workers to complete their task (though we used a blocking call...)
pool.join()


# ['finished file file0', 'finished file file1',  ... , 'finished file file4']
print results

添加乔尔的评论,提到一个常见的陷阱:

  

确保传递给pool.map()的函数仅包含在模块级别定义的对象。 Python多处理使用pickle在进程之间传递对象,而pickle在嵌套范围中定义的函数之类的问题上存在问题。

The docs for what can be pickled