通过文件树进行多处理而不输出预期输出

时间:2017-07-11 13:44:13

标签: python python-2.7 multiprocessing

通过目录树进行多处理无法按预期工作。我尝试将所有iso个文件添加到单个set()并输出该组。我知道我告诉python返回None但我不知道如何在不返回None的情况下做到这一点。如何从多处理输出单个集?

import itertools
import multiprocessing


def worker(filename):
    data_set = set()
    if ".iso" in filename:
        data_set.add(filename)
    return data_set if len(data_set) != 0 else None


def search_for_iso(dirname=None, verbose=False, default_path="/"):
    iso_found = set()
    if dirname is None:
        pool = multiprocessing.Pool(processes=48)
        walker = os.walk(default_path)
        file_data_gen = itertools.chain.from_iterable((
            os.path.join(root, f) for f in files) for root, sub, files in walker)
        results = pool.map(worker, file_data_gen)
        return results

截至目前,它将输出以下内容:set(['/test.iso', None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, .....]) # whole lot of None's

预期输出:set(['/test.iso'])

2 个答案:

答案 0 :(得分:0)

我通过运行处理结果并检查文件是否以.iso结尾来找到解决方案:

def worker(filename):
    if filename.endswith(".iso"):
        return filename


def search_for_iso(dirname=None, verbose=False, default_path="/"):
    retval = set()
    if dirname is None:
        pool = multiprocessing.Pool(processes=48)
        walker = os.walk(default_path)
        file_data_gen = itertools.chain.from_iterable((
            os.path.join(root, f) for f in files) for root, sub, files in walker)
        results = pool.map(worker, file_data_gen)
        for data in results:
            if data is not None:
                retval.add(data)
        return retval

这似乎工作正常,似乎并没有减慢整个过程的速度

答案 1 :(得分:0)

首先,您不会使用多个进程获得任何额外的性能,因为您仍然需要等待比cpu慢得多的文件系统。

对于您当前的代码,只需返回该集,即使它是空的

def worker(filename):
    data_set = set()
    if ".iso" in filename:
        data_set.add(filename)
    return data_set