通过目录树进行多处理无法按预期工作。我尝试将所有iso
个文件添加到单个set()
并输出该组。我知道我告诉python返回None
但我不知道如何在不返回None
的情况下做到这一点。如何从多处理输出单个集?
import itertools
import multiprocessing
def worker(filename):
data_set = set()
if ".iso" in filename:
data_set.add(filename)
return data_set if len(data_set) != 0 else None
def search_for_iso(dirname=None, verbose=False, default_path="/"):
iso_found = set()
if dirname is None:
pool = multiprocessing.Pool(processes=48)
walker = os.walk(default_path)
file_data_gen = itertools.chain.from_iterable((
os.path.join(root, f) for f in files) for root, sub, files in walker)
results = pool.map(worker, file_data_gen)
return results
截至目前,它将输出以下内容:set(['/test.iso', None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, .....]) # whole lot of None's
预期输出:set(['/test.iso'])
答案 0 :(得分:0)
我通过运行处理结果并检查文件是否以.iso
结尾来找到解决方案:
def worker(filename):
if filename.endswith(".iso"):
return filename
def search_for_iso(dirname=None, verbose=False, default_path="/"):
retval = set()
if dirname is None:
pool = multiprocessing.Pool(processes=48)
walker = os.walk(default_path)
file_data_gen = itertools.chain.from_iterable((
os.path.join(root, f) for f in files) for root, sub, files in walker)
results = pool.map(worker, file_data_gen)
for data in results:
if data is not None:
retval.add(data)
return retval
这似乎工作正常,似乎并没有减慢整个过程的速度
答案 1 :(得分:0)
首先,您不会使用多个进程获得任何额外的性能,因为您仍然需要等待比cpu慢得多的文件系统。
对于您当前的代码,只需返回该集,即使它是空的
def worker(filename):
data_set = set()
if ".iso" in filename:
data_set.add(filename)
return data_set