Question

在我们的系统中，我们已经开始遇到一个正常工作的问题，但现在它似乎挂起并使用大量内存（因此其他任务失败并引发MemoryError）。

上下文：下面的示例代码没有问题，但是对于新的数据库 huge_dataframe 变得更大了。问题是如果我分开运行这两个部分就可以了。但是在 process_data_task 中一起运行会导致问题。在Linux上运行Python 2.7。

我怀疑是fork（），但每个子进程怎么会占用这么多内存呢？在开始多处理之前删除 huge_dataframe 。同样奇怪的是 do_recalculations 只有在 process_data_task （孩子没有加入？）中调用时才会挂起，但它不会抛出任何异常。任何解释或想法进行故障排除？

def process_data_task():
    # part 1, high memory usage
    huge_dataframe = retrieve_data()
    process_table(huge_dataframe)

    # added this trying to fix the problem but it didn't help
    del huge_dataframe
    gc.collect()

    # part 2, heavy CPU usage, multiprocessing
    do_recalculations()  # recalculate items in parallel

# Multiprocessing done here
def do_recalculations():
     processes = cpu_count()
     items_to_update = [...]  # query database
     work_total_list = chunkify(items_to_update, processes)
     p = Pool(processes)
     result = p.map(sub_process_func, work_total_list)
     p.close()
     p.join()

Python多处理意外的高内存使用率

0 个答案: