Question

我在列表中有一个大型数据集，我需要做一些工作。

我想在任何给定时间启动x个线程来处理列表，直到弹出该列表中的所有内容。

我知道如何在给定时间启动x个线程（比方说20个）（通过使用thread1 .... thread20.start（））

但是当前20个线程中的一个完成时，如何让它启动新线程？所以在任何给定的时间都有20个线程在运行，直到列表为空。

到目前为止：

class queryData(threading.Thread):
    def __init__(self,threadID):
        threading.Thread.__init__(self)
        self.threadID = threadID
    def run(self):
        global lst
        #Get trade from list
        trade = lst.pop()
        tradeId=trade[0][1][:6]
        print tradeId


thread1 = queryData(1)
thread1.start()

更新

我有以下代码：

for i in range(20):
    threads.append(queryData(i))
for thread in threads:
    thread.start()

while len(lst)>0:
    for iter,thread in enumerate(threads):
        thread.join()
        lock.acquire()
        threads[iter] = queryData(i)
        threads[iter].start()
        lock.release()

现在它在开始时启动20个线程......然后在一个完成后继续启动新线程。

然而，它效率不高，因为它等待列表中的第一个完成，然后是第二个......依此类推。

有更好的方法吗？

基本上我需要：

-Start 20 threads:
-While list is not empty:
   -wait for 1 of the 20 threads to finish
   -reuse or start a new thread

Answer 1

正如我在评论中所建议的那样，我认为使用multiprocessing.pool.ThreadPool是合适的 - 因为它会自动处理您在代码中手动执行的大部分线程管理。一旦所有线程排队等待通过ThreadPool的{{1}}方法调用进行处理，唯一需要做的就是等到它们全部执行完毕（除非你的代码还有别的东西）当然可以做。）

我已将linked answer中的代码翻译成另一个相关问题，因此它更类似于您在当前上下文中更容易理解的内容。

apply_async()

Answer 2

您可以等待线程完成：thread.join()。此调用将阻塞，直到该线程完成，此时您可以创建一个新的。

但是，不是每次都重新生成一个线程，为什么不回收现有的线程呢？

这可以通过使用任务来完成。您在共享集合中保留任务列表，当其中一个线程完成任务时，它会从该集合中检索另一个任务。

如何在旧线程结束时启动新线程？

2 个答案: