Question

并行生成延迟生成器的每个元素并在其上运行一些（cpu-bound）评估函数的好方法是什么？我不需要在每个元素上使用评估函数的映射，只需要满足条件的单个结果。多处理模块有很多例子，但我无法弄清楚如何将它应用到我的问题中。

示例：

我有一个关键候选项的延迟生成器，我想针对一个评估函数进行测试，该函数确定在运行某些加密算法时，该键与某些消息一起产生预期输出。

下面的示例代码在我的机器上需要53秒才能运行worker（keySpace）。我想生成 cpu_count 线程来并行执行此操作，以便大致缩短 1 / cpu_count 的时间。一个工作人员返回后，可以停止所有线程。 keySpace的大小是事先已知的，偶尔报告每个线程的当前进度会很好。

from itertools import cycle, islice
from timeit import default_timer as timer
from struct import pack
from multiprocessing import Pool, cpu_count
from time import sleep

#all keys from b'\x00\x00\x00\x00' to b'xff\xff\xff\x00'
keySpaceSize = 2**24
keySpace = map(lambda x: pack("<I", x), range(keySpaceSize))

#the ciphertext is 'HelloWorld' xor'd with 'ABC\0'
knownPlaintext = b'HelloWorld'
ciphertext = b"\t'/l.\x15,r-&"

#lets use xor encryption as an example
def someSymmetricAlgorithm(key, message):
    return bytes((int(k) ^ int(m) for (k, m) in zip(cycle(key), message)))

#returns the given key when it produces the known plaintext from the ciphertext
def testKey(key):
    plaintext = someSymmetricAlgorithm(key, ciphertext)
    if plaintext == knownPlaintext:
        return key

#tests each element of an iterable
def worker(KeySpaceSlice):
    for key in KeySpaceSlice:
        result = testKey(key)
        if result:
            return result

start = timer()

#how to parallelize this?
result = worker(keySpace)

elapsed = timer() - start

if result:
    print("found key: %s in %f seconds" % (result, elapsed))

附带问题：

最初，我以为我会迭代生成器并使用Pool.imap。但出乎意料的是，即使使用chunksize = keySpaceSize / poolSize，它似乎也会为每个评估产生一个过程。我是否误解了“chunksize”的含义？我以为它只会为每个块创建一个线程。

环境：

Python 3.4.2 (v3.4.2:ab2c023a9432, Oct  6 2014, 22:15:05) [MSC v.1600 32 bit (Intel)] on win32

我想要一个可以在linux上运行的线程化解决方案。

Python（3.x）中延迟生成器的多线程评估，用于强制执行

0 个答案: