Question

目前，这个嵌套for循环需要将近一个小时才能完成。我希望重写它并创建一些并行同步。我没有找到任何关于如何做嵌套的答案，如下所示。任何指向正确方向的人都会非常感激。

  #used to update the Software Name's from softwareCollection using the regexCollection
    startTime = time.time()
    for x in softwareCollection.find({}, {"Software Name":-1,"Computer Name":-1,"Version":-1,"Publisher":-1,"reged": null }, no_cursor_timeout=True):
        for y in regexCollection.find({}, {"regName": 1,"newName":1}, no_cursor_timeout=True):
            try:
                regExp = re.compile(y["regName"])
            except:
                print(y["regName"])
                break
            oldName = x["Software Name"]
            newName = y["newName"]
            if(regExp.search(oldName)):
                x["Software Name"] = newName
                x["reged"] = "true"
                softwareCollection.save(x)
                break
            else:
                continue
    print(startTime - time.time() / 60)
    cursor.close()

Answer 1

根据x上的迭代次数，您可以为每个x步骤生成一个线程，该线程会迭代y。

首先，根据x定义运行函数：

def y_iteration(x):
    for y in ... :
        ...

然后在x上的每次迭代中生成一个运行此函数的线程：

for x in ... :
    _thread.start_new_thread(y_iteration, (x,))

这是一个非常基本的例子，使用低级_thread模块。

现在您可能需要加入主线程，在这种情况下，您将需要使用threading模块。您可能会在线程中加入x次迭代并加入它：

def x_iteration():
    for x in ... :
        threading.Thread(target=y_iteration, args=(x,)).start()

thread = threading.Thread(target=x_iteration)
thread.start()
thread.join()

然后，这取决于您计划进行的x上的迭代次数（查看How many threads it too many?）。如果该数字应该很好，您可能想要创建一个，例如，一百个工作人员的池，并用y_iteration提供它们。当每个工人都在工作时，等到一个人有空。

Answer 2

所以我能够让它运行和工作的速度是顺序版本的两倍。我担心的是，完成这个过程还需要4个小时。有没有办法让这个更有效率，或者我希望这需要这么长时间。

#used to update the Software Name's from softwareCollection using the regexCollection
def foo(x):
    for y in regexCollection.find({}, {"regName": 1,"newName":1}, no_cursor_timeout=True):
        try:
            regExp = re.compile(y["regName"])
        except:
            print(y["regName"])
            break
        oldName = x["SoftwareName"]
        newName = y["newName"]
        if(regExp.search(oldName)):
            x["SoftwareName"] = newName
            x["field4"] = "reged"
            softwareCollection.save(x)
            break
        else:
            continue


if __name__ == '__main__':
    startTime = time.time()
    Parallel(n_jobs=4)(delayed(foo)(x) for x in softwareCollection.find())

    print(time.time() - startTime / 60)
    cursor.close()

并行for循环，Python

2 个答案: