限制Python上每秒的HTTP请求数

时间:2014-09-29 11:21:51

标签: python python-multithreading throttling bandwidth-throttling

我编写了一个脚本,用于从文件中提取URL并同时向所有URL发送HTTP请求。我现在想要限制会话中每秒的HTTP请求数和每个接口的带宽(eth0eth1等)。有没有办法在Python上实现这个目标?

3 个答案:

答案 0 :(得分:0)

您可以使用Semaphore对象,它是标准Python lib的一部分: python doc

或者如果您想直接使用线程,可以使用wait([timeout])。

没有与Python捆绑的库可以在以太网或其他网络接口上运行。你可以去的最低点就是套接字。

根据你的回复,这是我的建议。注意active_count。仅用于测试您的脚本只运行两个线程。那么在这种情况下,它们将是三个,因为第一个是您的脚本,然后您有两个URL请求。

import time
import requests
import threading

# Limit the number of threads.
pool = threading.BoundedSemaphore(2)

def worker(u):
    # Request passed URL.
    r = requests.get(u)
    print r.status_code
    # Release lock for other threads.
    pool.release()
    # Show the number of active threads.
    print threading.active_count()

def req():
    # Get URLs from a text file, remove white space.
    urls = [url.strip() for url in open('urllist.txt')]
    for u in urls:
        # Thread pool.
        # Blocks other threads (more than the set limit).
        pool.acquire(blocking=True)
        # Create a new thread.
        # Pass each URL (i.e. u parameter) to the worker function.
        t = threading.Thread(target=worker, args=(u, ))
        # Start the newly create thread.
        t.start()

req()

答案 1 :(得分:0)

您可以使用文档中描述的工作者概念: https://docs.python.org/3.4/library/queue.html

在worker中添加一个wait()命令,让它们在请求之间等待(在文档的示例中:" while true"在task_done之后)。

示例:5"工作人员" - 请求之间等待时间为1秒的线程将少于每秒5次提取。

答案 2 :(得分:0)

请注意,以下解决方案仍然可以按顺序发送请求,但会限制TPS(每秒的交易量)

TLDR; 有一个类可以统计当前秒内仍然可以拨打的电话数。每次拨打电话并每秒重新填充时,该费用都会减少。

import time
from multiprocessing import Process, Value

# Naive TPS regulation

# This class holds a bucket of tokens which are refilled every second based on the expected TPS
class TPSBucket:

    def __init__(self, expected_tps):
        self.number_of_tokens = Value('i', 0)
        self.expected_tps = expected_tps
        self.bucket_refresh_process = Process(target=self.refill_bucket_per_second) # process to constantly refill the TPS bucket

    def refill_bucket_per_second(self):
        while True:
            print("refill")
            self.refill_bucket()
            time.sleep(1)

    def refill_bucket(self):
        self.number_of_tokens.value = self.expected_tps
        print('bucket count after refill', self.number_of_tokens)

    def start(self):
        self.bucket_refresh_process.start()

    def stop(self):
        self.bucket_refresh_process.kill()

    def get_token(self):
        response = False
        if self.number_of_tokens.value > 0:
            with self.number_of_tokens.get_lock():
                if self.number_of_tokens.value > 0:
                    self.number_of_tokens.value -= 1
                    response = True

        return response

def test():
    tps_bucket = TPSBucket(expected_tps=1) ## Let's say I want to send requests 1 per second
    tps_bucket.start()
    total_number_of_requests = 60 ## Let's say I want to send 60 requests
    request_number = 0
    t0 = time.time()
    while True:
        if tps_bucket.get_token():
            request_number += 1

            print('Request', request_number) ## This is my request

            if request_number == total_number_of_requests:
                break

    print (time.time() - t0, ' time elapsed') ## Some metrics to tell my how long every thing took
    tps_bucket.stop()


if __name__ == "__main__":
    test()