GitPython导致concurrent.futures.ThreadPoolExecutor忽略max_workers

时间:2015-01-23 00:05:00

标签: python git concurrency gitpython concurrent.futures

我正在编写一些Python代码来并行执行大量git存储库的操作。为此,我尝试将concurrent.futuresGitPython结合起来,在单独的未来任务中克隆每个存储库。这是使用OS X 10.10上的内置Python 2.7.6以及通过pip安装的GitPython 0.3.5和期货2.2.0(版本后端移植到2.7)。

我使用的代码的一个简单示例如下:

import time
from concurrent import futures
import shutil
import os
from git import Repo


def wait_then_return(i):
    print('called: %s', i)
    time.sleep(2)
    return i


def clone_then_return(i):
    print('called: %s', i)
    path = os.path.join('/tmp', str(i))
    os.mkdir(path)
    # clone some arbitrary repo
    Repo.clone_from('https://github.com/ros/rosdistro', path)
    shutil.rmtree(path)
    return i



if __name__ == "__main__":

    tasks = 20
    workers = 4

    with futures.ThreadPoolExecutor(max_workers=workers) as executor:

        # this works as expected... delaying work until a thread is available
        # fs = [executor.submit(wait_then_return, i) for i in range(0, tasks)]
        # this doesn't... all 20 come in quick succession
        fs = [executor.submit(clone_then_return, i) for i in range(0, tasks)]

        for future in futures.as_completed(fs):
            print('result: %s', future.result())

当我向执行者提交wait_then_return函数时,我得到了预期的行为:打印首先以四个一组完成,然后大致沿着这些行完成,直到所有期货都完成。如果我将其切换为clone_then_return,那么看起来好像执行者忽略了max_workers 参数并且并行运行了所有20个期货。

这可能是什么原因?

1 个答案:

答案 0 :(得分:0)

实际上我使用的git调用有一些身份验证问题导致未来很快完成。在并发的世界里,一切都是理智的。