如何使线程在Python中等待?

时间:2018-08-26 08:53:51

标签: python multithreading

我有以下代码:

        with ThreadPoolExecutor(max_workers=num_of_pages) as executor:
            futh = [(executor.submit(self.getdata2, page, hed, data, apifolder,additional)) for page in pages]
            for data in as_completed(futh):
                datarALL = datarALL + data.result()
        return datarALL

num_of_pages不是固定的,但通常约为250。 getdata2 func创建GET请求并返回每个页面结果:

问题是所有250个页面(线程)都一起创建。这意味着250个GET请求被同时调用。这会导致服务器过载,因此由于服务器响应延迟而导致大量重试,从而导致GET调用关闭并重试。我想避免它。

我想创建某种锁,如果活动请求超过10个,它将阻止线程/页面创建GET请求。在这种情况下,它将等待直到有可用的插槽。

类似:

executing_now = []
def getdata2(...)
    ...
    while len(executing_now)>10:
       sleep(10)
    executing_now.append(page)
    response = requests.get(url, data=data, headers=hed, verify=False)
    ....
    executing_now.remove(page)
    return ...

Python中是否存在用于此目的的机制?这需要线程检查共享内存...我想避免多线程问题,例如死锁等。

基本上扭曲GET调用,并限制同时执行多少个线程。

1 个答案:

答案 0 :(得分:1)

我们可以使用queue“准备”您的所有页面,然后您可以将线程池​​限制为任意数量的线程,因为每个线程将从队列中获取所需的页面:

# preparing here all you page objects
pages_queue = queue.Queue()
[pages_queue.put(page) for page in pages]

# ThreadPool - Each thread will take one page from queue, and when done, will fetch next one
with ThreadPoolExecutor(max_workers=10) as executor:
    futh = [(executor.submit(self.getdata2, pages_queue, hed, data, apifolder,additional))]
    for data in as_completed(futh):
        datarALL = datarALL + data.result()
return datarALL

def getdata2(...)
    ...
    try:
       while True: # non blocking wait will raise Empty when queue is empty
          page = pages_queue.get_nowait()
          response = requests.get(page.url, data=data, headers=hed, verify=False)
          ....
          return ...
    except queue.Empty:
       pass