Question

我正在尝试使用multiprocessing模块来处理大型csv文件。我正在使用Python 2.7并遵循here中的示例。

我运行了未经修改的代码（为方便起见，下面复制了代码）并注意到print函数中的worker语句不起作用。无法print使得理解流程和调试变得困难。

任何人都可以解释为什么print在这里不起作用？ pool.map不执行打印命令吗？我在网上搜索但没有找到任何可以表明这一点的文件。

import multiprocessing as mp
import itertools
import time
import csv

def worker(chunk):
    # `chunk` will be a list of CSV rows all with the same name column
    # replace this with your real computation
    print(chunk)     # <----- nothing prints
    print 'working'  # <----- nothing prints
    return len(chunk)  

def keyfunc(row):
    # `row` is one row of the CSV file.
    # replace this with the name column.
    return row[0]

def main():
    pool = mp.Pool()
    largefile = 'test.dat'
    num_chunks = 10
    results = []
    with open(largefile) as f:
        reader = csv.reader(f)
        chunks = itertools.groupby(reader, keyfunc)
        while True:
            # make a list of num_chunks chunks
            groups = [list(chunk) for key, chunk in
                      itertools.islice(chunks, num_chunks)]
            if groups:
                result = pool.map(worker, groups)
                results.extend(result)
            else:
                break
    pool.close()
    pool.join()
    print(results)

if __name__ == '__main__':
    main()

Answer 1

这是IDLE的一个问题，您可以使用它来运行代码。 IDLE对终端进行相当基本的仿真，以处理您在其中运行的程序的输出。它虽然无法处理子进程，所以虽然它们在后台运行得很好，但你永远不会看到它们的输出。

最简单的解决方法是从命令行运行代码。

另一种方法可能是使用更复杂的IDE。 the Python wiki上列出了一大堆，但我不确定哪些终端仿真更适合多处理输出。

为什么“打印”不能在Python多处理池.map中工作

1 个答案: