Question

我正在尝试对多个文件中的一堆数据进行多处理。每个有500 Mb的60个文件。我必须打开每个文件，然后阅读并执行一些任务，然后将结果写入文件...

import pandas as pd
import os
import multiprocessing as mp


def filter(df_hh):
    result = do some filtering
    return result

def read(file):
    df = pd.read_csv(file)
    df_wr = []
    df_d = divide data into steps and do for loop for processing
    thread = mp.Pool()
    for d in df_d:
        thread.apply_async(filter, args=(df_hh, ), callback=df_wr.append)
    thread.close()
    thread.join()

    # write data to a file

if __name__ == '__main__':
    files = get all filenames in a list
    for file in files:
        p = mp.Process(target=read, args=(file, ))
        p.start()

这不起作用。随着系统遇到内存错误。 OSError：[Errno 12]无法分配内存我正在使用cpu_count = 80的服务器。有任何改进代码的建议吗？

将流程嵌套在父流程中

0 个答案: