Question

标题非常通用，但问题可能不是。

我有一个脚本，它使用从文件传递的参数（xls文件）编译一些代码。基于xls上的配置数量，我必须编译某些文件。我想将每个编译的结果（stdout和stderr）存储在名称来自配置的文本文件中。

我已经能够做到这一切，但为了加速我想要并行运行所有编译的东西。有没有办法做到这一点？

示例文件..

for n in num_rows: # num_rows store all the rows read using xlrd object
    parameters_list = [...] # has all the parameters read from xls
    .
    .
    .
    logfile = ...txt #name is based on name read from xls

    p = subprocess.Popen(parameters_list, stderr=logfile)
    p.wait()
    logfile.close()

在关闭文件之前，我必须等待每个进程结束。

我的问题可能太长，但欢迎任何帮助或线索。

Answer 1

您可以使用multiprocessing.Pool执行此操作：

def parse_row(n):
    parameters_list = [...] # has all the parameters read from xls
    .
    .
    .
    logfile = ...txt #name is based on name read from xls
    p = subprocess.Popen(parameters_list, stderr=logfile)
    p.wait()
    logfile.close()
pool = multiprocessing.Pool()
pool.map_async(parse_row, num_rows)
pool.close()
pool.join()

Answer 2

假设您的进程都将写入不同的日志文件，答案很简单：subprocess模块已经并行运行。只需为每个对象创建一个不同的Popen对象，并将它们存储在一个列表中：

processes = []
logfiles = []
for n in num_rows: # num_rows store all the rows read using xlrd object
    parameters_list = [...] # has all the parameters read from xls
    .
    .
    .
    logfile = ...txt #name is based on name read from xls
    logfiles.append(logfile)

    p = subprocess.Popen(parameters_list, stderr=logfile)
    logfiles.append(logfile)
    processes.append(p)

# Now, outside the for loop, the processes are all running in parallel.
# Now we can just wait for each of them to finish, and close its corresponding logfile

for p, logfile in zip(processes, logfiles):
    p.wait() # This will return instantly if that process was already finished
    logfile.close()

如何在python

2 个答案: