Question

我编写了一个脚本，用于从文件中提取URL并同时向所有URL发送HTTP请求。我现在想要限制会话中每秒的HTTP请求数和每个接口的带宽（eth0，eth1等）。有没有办法在Python上实现这个目标？

Answer 1

您可以使用Semaphore对象，它是标准Python lib的一部分： python doc

或者如果您想直接使用线程，可以使用wait（[timeout]）。

没有与Python捆绑的库可以在以太网或其他网络接口上运行。你可以去的最低点就是套接字。

根据你的回复，这是我的建议。注意active_count。仅用于测试您的脚本只运行两个线程。那么在这种情况下，它们将是三个，因为第一个是您的脚本，然后您有两个URL请求。

import time
import requests
import threading

# Limit the number of threads.
pool = threading.BoundedSemaphore(2)

def worker(u):
    # Request passed URL.
    r = requests.get(u)
    print r.status_code
    # Release lock for other threads.
    pool.release()
    # Show the number of active threads.
    print threading.active_count()

def req():
    # Get URLs from a text file, remove white space.
    urls = [url.strip() for url in open('urllist.txt')]
    for u in urls:
        # Thread pool.
        # Blocks other threads (more than the set limit).
        pool.acquire(blocking=True)
        # Create a new thread.
        # Pass each URL (i.e. u parameter) to the worker function.
        t = threading.Thread(target=worker, args=(u, ))
        # Start the newly create thread.
        t.start()

req()

Answer 2

您可以使用文档中描述的工作者概念： https://docs.python.org/3.4/library/queue.html

在worker中添加一个wait（）命令，让它们在请求之间等待（在文档的示例中：＆＃34; while true＆＃34;在task_done之后）。

示例：5＆＃34;工作人员＆＃34; - 请求之间等待时间为1秒的线程将少于每秒5次提取。

Answer 3

请注意，以下解决方案仍然可以按顺序发送请求，但会限制TPS（每秒的交易量）

TLDR；有一个类可以统计当前秒内仍然可以拨打的电话数。每次拨打电话并每秒重新填充时，该费用都会减少。

import time
from multiprocessing import Process, Value

# Naive TPS regulation

# This class holds a bucket of tokens which are refilled every second based on the expected TPS
class TPSBucket:

    def __init__(self, expected_tps):
        self.number_of_tokens = Value('i', 0)
        self.expected_tps = expected_tps
        self.bucket_refresh_process = Process(target=self.refill_bucket_per_second) # process to constantly refill the TPS bucket

    def refill_bucket_per_second(self):
        while True:
            print("refill")
            self.refill_bucket()
            time.sleep(1)

    def refill_bucket(self):
        self.number_of_tokens.value = self.expected_tps
        print('bucket count after refill', self.number_of_tokens)

    def start(self):
        self.bucket_refresh_process.start()

    def stop(self):
        self.bucket_refresh_process.kill()

    def get_token(self):
        response = False
        if self.number_of_tokens.value > 0:
            with self.number_of_tokens.get_lock():
                if self.number_of_tokens.value > 0:
                    self.number_of_tokens.value -= 1
                    response = True

        return response

def test():
    tps_bucket = TPSBucket(expected_tps=1) ## Let's say I want to send requests 1 per second
    tps_bucket.start()
    total_number_of_requests = 60 ## Let's say I want to send 60 requests
    request_number = 0
    t0 = time.time()
    while True:
        if tps_bucket.get_token():
            request_number += 1

            print('Request', request_number) ## This is my request

            if request_number == total_number_of_requests:
                break

    print (time.time() - t0, ' time elapsed') ## Some metrics to tell my how long every thing took
    tps_bucket.stop()


if __name__ == "__main__":
    test()

限制Python上每秒的HTTP请求数

3 个答案: