应用错误收集

我尝试使用 concurrent.futures 中的 ProcessPoolExecutor 来并行大型文件处理。处理相对简单，但I / O很重：读取数千个2G输入文件并写入2G输出文件。这是伪代码：

def process_single_file(file):
    output = process(file)
    write(output)

With concurrent.futures.ProcessPoolExecutor() as executor:
    collections.deque(executor.map(process_single_file, file_list), maxlen=0)

我在48核心工作站上运行代码，性能非常差。该计划经常遭遇冻结。它在我的笔记本电脑上使用SSD和一些测试文件实际上运行得非常好（比无与伦比的版本快得多）。

我没有并行编程的经验，想知道冻结的原因是什么，以及在这种情况下使用python的最佳实践。

谢谢！

使用concurrent.futures在python中进行大量IO并行化

0 个答案: