Question

在使用Python的import numpy as np from multiprocessing.pool import ThreadPool def worker(x): # Bloat the memory footprint of this function a = x ** x b = a + x c = x / b return hash(c.tobytes()) tasks = (np.random.rand(1000, 1000) for _ in range(500)) with ThreadPool(4) as pool: for result in pool.imap(worker, tasks): assert result is not None并行化CPU密集型任务时，似乎似乎是在累积工作人员使用的内存，而没有释放它们。我试图简化这个问题：

for task in tasks:
    assert worker(task) is not None

运行此代码段时，可以轻松观察Python使用的内存占用量的巨大变化。但是我希望它的行为与

几乎相同

worker

其内存成本可以忽略不计。

我如何修改代码片段以使用ThreadPool将{{1}}函数应用于每个数组？

Answer 1

原来的解释很简单。修改示例以仅在worker内部创建随机数组将解决问题：

def worker(x):
    x = x()
    # Bloat the memory footprint of this function
    a = x ** x
    b = a + x
    c = x / b
    return hash(c.tobytes())

tasks = (lambda: np.random.rand(1000, 1000) for _ in range(500))

似乎ThreadPools.imap将在内部将生成器tasks变成一个列表或类似内容。当然，这将需要一次将所有500个随机数组存储在内存中。

ThreadPool不释放内存吗？

1 个答案: