我有以下代码:
with ThreadPoolExecutor(max_workers=num_of_pages) as executor:
futh = [(executor.submit(self.getdata2, page, hed, data, apifolder,additional)) for page in pages]
for data in as_completed(futh):
datarALL = datarALL + data.result()
return datarALL
num_of_pages
不是固定的,但通常约为250。
getdata2
func创建GET请求并返回每个页面结果:
问题是所有250个页面(线程)都一起创建。这意味着250个GET请求被同时调用。这会导致服务器过载,因此由于服务器响应延迟而导致大量重试,从而导致GET调用关闭并重试。我想避免它。
我想创建某种锁,如果活动请求超过10个,它将阻止线程/页面创建GET请求。在这种情况下,它将等待直到有可用的插槽。
类似:
executing_now = []
def getdata2(...)
...
while len(executing_now)>10:
sleep(10)
executing_now.append(page)
response = requests.get(url, data=data, headers=hed, verify=False)
....
executing_now.remove(page)
return ...
Python中是否存在用于此目的的机制?这需要线程检查共享内存...我想避免多线程问题,例如死锁等。
基本上扭曲GET调用,并限制同时执行多少个线程。
答案 0 :(得分:1)
我们可以使用queue
“准备”您的所有页面,然后您可以将线程池限制为任意数量的线程,因为每个线程将从队列中获取所需的页面:
# preparing here all you page objects
pages_queue = queue.Queue()
[pages_queue.put(page) for page in pages]
# ThreadPool - Each thread will take one page from queue, and when done, will fetch next one
with ThreadPoolExecutor(max_workers=10) as executor:
futh = [(executor.submit(self.getdata2, pages_queue, hed, data, apifolder,additional))]
for data in as_completed(futh):
datarALL = datarALL + data.result()
return datarALL
def getdata2(...)
...
try:
while True: # non blocking wait will raise Empty when queue is empty
page = pages_queue.get_nowait()
response = requests.get(page.url, data=data, headers=hed, verify=False)
....
return ...
except queue.Empty:
pass