Python中的多进程在逐行读取文件时比uniprocess慢4000倍,这有什么问题?

时间:2018-05-18 09:30:04

标签: python-3.x python-multiprocessing

我需要逐行读取大型(10GB +)文件并处理每一行。处理相当简单,因此多处理似乎是最佳选择。但是,当我设置它时,它比线性运行要慢得多。我的CPU使用率从未超过50%,因此它不是处理能力问题。

我在Mac上的Jupyter Notebook中运行Python 3.6。

这就是我所拥有的,根据已发布的here发布的答案:

def do_work(in_queue, out_list):
    while True:
        line = in_queue.get()


        # exit signal 
        if line == None:
            return

        #fake work for testing
        elements = line.split("\t")
        out_list.append(elements)



if __name__ == "__main__":
    num_workers = 4

    manager = Manager()
    results = manager.list()
    work = manager.Queue(num_workers)

    # start for workers    
    pool = []
    for i in range(num_workers):
        p = Process(target=do_work, args=(work, results))
        p.start()
        pool.append(p)

    # produce data
    with open(file_on_my_machine, 'rt',newline="\n") as f:

        for line in f:
            work.put(line)

    for p in pool:
        p.join()

    # get the results
    print(sorted(results))

0 个答案:

没有答案