如何使用aiohttp在客户端设置每秒的最大请求数(限制它们)?
答案 0 :(得分:24)
从v2.0开始,当使用ClientSession
时,$ awk -v tgt="address 10.1.104.164" '
/^ltm pool/ { pool=$0; sub(/ *{ *$/,"",pool) }
index($0" ",tgt" ") { print pool }
' file
ltm pool pool_10.1.105.30_80
ltm pool pool_10.1.105.31_80
会自动将同时连接数限制为100。
您可以通过创建自己的TCPConnector
并将其传递到aiohttp
来修改限制。例如,创建一个限制为50个并发请求的客户端:
ClientSession
如果它更适合您的使用案例,还有一个import aiohttp
connector = aiohttp.TCPConnector(limit=50)
client = aiohttp.ClientSession(connector=connector)
参数(默认情况下是关闭的)您可以传递以限制同时连接的数量# 34;端点&#34 ;.根据文档:
limit_per_host
(limit_per_host
) - 同时连接到同一端点的限制。如果端点具有相等的int
三倍,则端点相同。
使用示例:
(host, port, is_ssl)
答案 1 :(得分:22)
我在这里找到了一个可能的解决方案:http://compiletoi.net/fast-scraping-in-python-with-asyncio.html
同时做3个请求很酷,做5000,但是,不太好。如果您尝试同时执行过多请求,则连接可能会开始关闭,或者您甚至可能会被禁止访问该网站。
为避免这种情况,您可以使用信号量。它是一个同步工具,可用于限制在某些时候执行某些操作的协同程序的数量。我们将在创建循环之前创建信号量,并将我们想要允许的同时请求数作为参数传递:
sem = asyncio.Semaphore(5)
然后,我们只需要替换:
page = yield from get(url, compress=True)
同样的事情,但受信号量的保护:
with (yield from sem):
page = yield from get(url, compress=True)
这将确保最多可以同时完成5个请求。
答案 2 :(得分:1)
这是一个没有aiohttp
的示例,但是您可以使用aiohttp.request
装饰器包装任何异步方法或Limit
import asyncio
import time
class Limit(object):
def __init__(self, calls=5, period=1):
self.calls = calls
self.period = period
self.clock = time.monotonic
self.last_reset = 0
self.num_calls = 0
def __call__(self, func):
async def wrapper(*args, **kwargs):
if self.num_calls >= self.calls:
await asyncio.sleep(self.__period_remaining())
period_remaining = self.__period_remaining()
if period_remaining <= 0:
self.num_calls = 0
self.last_reset = self.clock()
self.num_calls += 1
return await func(*args, **kwargs)
return wrapper
def __period_remaining(self):
elapsed = self.clock() - self.last_reset
return self.period - elapsed
@Limit(calls=5, period=2)
async def test_call(x):
print(x)
async def worker():
for x in range(100):
await test_call(x + 1)
asyncio.run(worker())
答案 3 :(得分:0)
您可以为每个请求设置一个延迟,或者将URL分为几批,然后限制这些批处理以满足所需的频率。
使用asyncio.sleep
强制脚本在两次请求之间等待
import asyncio
import aiohttp
delay_per_request = 0.5
urls = [
# put some URLs here...
]
async def app():
tasks = []
for url in urls:
tasks.append(asyncio.ensure_future(make_request(url)))
await asyncio.sleep(delay_per_request)
results = await asyncio.gather(*tasks)
return results
async def make_request(url):
print('$$$ making request')
async with aiohttp.ClientSession() as sess:
async with sess.get(url) as resp:
status = resp.status
text = await resp.text()
print('### got page data')
return url, status, text
例如,可以使用results = asyncio.run(app())
。
使用上方的make_request
,您可以请求和限制一批URL,如下所示:
import asyncio
import aiohttp
import time
max_requests_per_second = 0.5
urls = [[
# put a few URLs here...
],[
# put a few more URLs here...
]]
async def app():
results = []
for i, batch in enumerate(urls):
t_0 = time.time()
print(f'batch {i}')
tasks = [asyncio.ensure_future(make_request(url)) for url in batch]
for t in tasks:
d = await t
results.append(d)
t_1 = time.time()
# Throttle requests
batch_time = (t_1 - t_0)
batch_size = len(batch)
wait_time = (batch_size / max_requests_per_second) - batch_time
if wait_time > 0:
print(f'Too fast! Waiting {wait_time} seconds')
time.sleep(wait_time)
return results
同样,此操作可以与asyncio.run(app())
一起运行。