Question

我使用python threading.Thread来生成线程，为os.walk（）中的每个文件名执行一个小实用程序并获取其输出。我尝试使用以下方法限制线程数：

ThreadLimiter = threading.BoundedSemaphore(3)

和

ThreadLimiter.acquire()

在开始运行方法和

ThreadLimiter.release()

在运行结束方法

但是当我运行python程序时，我仍然收到以下错误消息。有关改进的建议吗？

bash: fork: retry: Resource temporarily unavailable
bash: fork: retry: Resource temporarily unavailable

Answer 1

使用线程池并为自己节省大量工作！这里我是md5sum文件：

import os
import multiprocessing.pool
import subprocess as subp

def walker(path):
    """Walk the file system returning file names"""
    for dirpath, dirs, files in os.walk(path):
        for fn in files:
            yield os.path.join(dirpath, fn)

def worker(filename):
    """get md5 sum of file"""
    p = subp.Popen(['md5sum', filename], stdin=subp.PIPE,
            stdout=subp.PIPE, stderr=subp.PIPE)
    out, err = p.communicate()
    return filename, p.returncode, out, err

pool = multiprocessing.pool.ThreadPool(3)

for filename, returncode, out, err in pool.imap(worker, walker('.'), chunksize=1):
    print(filename, out.strip())

Answer 2

执行run时，线程已经启动。使用run内的限制不会限制正在运行线程的数量，而是限制完成线程的数量 - 使问题变得更糟！

或者：

修改start以延迟启动主题。
在os.walk循环中，保留活动线程列表，并在有太多时使用thread.join阻止。
使用线程池，例如multiprocessing.pool.ThreadPool。

限制Python线程：资源暂时不可用

2 个答案: