Question

我正在尝试增加每秒的请求量。我目前正在使用python 2.7，并且每秒能够接收大约1个请求。我是否需要对函数进行多线程/多处理或异步运行func的多个实例。我不知道如何使这项工作。请帮忙：-）

while True:
    r = requests.post(url, allow_redirects=False, data={
        str(formDataNameLogin): username,
        str(formDataNamePass): password,
    })

    print 'Sending username: %s with password %s' % (username, password)

Answer 1

只需使用任何异步库。我认为请求的异步版本，例如grequest，txrequests，requests-futures和requests-threads最适合您。在grequests自述文件的代码示例下面：

import grequests

urls = [
    'http://www.heroku.com',
    'http://python-tablib.org',
    'http://httpbin.org',
    'http://python-requests.org',
    'http://fakedomain/',
    'http://kennethreitz.com'
]

创建一组未发送的请求：

rs = (grequests.get(u) for u in urls)

同时发送所有内容：

grequests.map(rs)

使用或学习其他提到的模块，例如请求线程，可能会涉及更多的问题，尤其是在python 2中

from twisted.internet.defer import inlineCallbacks
from twisted.internet.task import react
from requests_threads import AsyncSession

session = AsyncSession(n=100)

@inlineCallbacks
def main(reactor):
    responses = []
    for i in range(100):
        responses.append(session.get('http://httpbin.org/get'))

    for response in responses:
        r = yield response
        print(r)

if __name__ == '__main__':
    react(main)

asyncio和aiohttp可能更值得注意，但是，我想，学习已经熟悉的模块的版本会更容易。

多线程处理是不必要的，但是您可以尝试mutithreading，或者甚至更好地尝试多线程处理，然后看看哪种方法效果最好。

Answer 2

您可以像使用多线程一样执行多个并行请求：

import Queue
import threading
import time
import requests

exit_flag = 0

class RequestThread(threading.Thread):
    def __init__(self, thread_id, name, q):
        threading.Thread.__init__(self)
        self.thread_id = thread_id
        self.name = name
        self.q = q
    def run(self):
        print("Starting {0:s}".format(self.name))
        process_data(self.name, self.q)
        print("Exiting {0:s}".format(self.name))

def process_data(thread_name, q):
    while not exit_flag:
        queue_lock.acquire()
        if not qork_queue.empty():
            data = q.get()
            queue_lock.release()
            print("{0:s} processing {1:s}".format(thread_name, data))
            response = requests.get(data)
            print(response)
        else:
            queue_lock.release()
        time.sleep(1)

thread_list = ["Thread-1", "Thread-2", "Thread-3"]
request_list = [
    "https://api.github.com/events",
    "http://api.plos.org/search?q=title:THREAD",
    "http://api.plos.org/search?q=title:DNA",
    "http://api.plos.org/search?q=title:PYTHON",
    "http://api.plos.org/search?q=title:JAVA"
]
queue_lock = threading.Lock()
qork_queue = Queue.Queue(10)
threads = []
thread_id = 1

# Create new threads
for t_name in thread_list:
    thread = RequestThread(thread_id, t_name, qork_queue)
    thread.start()
    threads.append(thread)
    thread_id += 1

# Fill the queue
queue_lock.acquire()
for word in request_list:
    qork_queue.put(word)
queue_lock.release()

# Wait for queue to empty
while not qork_queue.empty():
    pass

# Notify threads it's time to exit
exit_flag = 1

# Wait for all threads to complete
for t in threads:
    t.join()

print("Exiting Main Thread")

输出：

Starting Thread-1
Starting Thread-2
Starting Thread-3
Thread-1 processing https://api.github.com/events
Thread-2 processing http://api.plos.org/search?q=title:THREAD
Thread-3 processing http://api.plos.org/search?q=title:DNA
<Response [200]>
<Response [200]>
<Response [200]>
Thread-2 processing http://api.plos.org/search?q=title:PYTHON
Thread-3 processing http://api.plos.org/search?q=title:JAVA
Exiting Thread-1
<Response [200]>
<Response [200]>
Exiting Thread-3
Exiting Thread-2
Exiting Main Thread

尽管我不是多线程专家，但还是有一点解释：

1。队列

通过Queue模块，您可以创建一个可以容纳特定数量项目的新队列对象。有以下方法可以控制队列：

get（）-从队列中删除并返回一个项目。
put（）-将项目添加到队列。 qsize（）-返回当前在队列中的项目数。
empty（）-如果队列为空，则返回True；否则，返回True。否则为假。
full（）-如果队列已满，则返回True；否则，返回True。否则为假。

根据我对多线程的一点经验，这对于控制仍要处理的数据很有用。我遇到的情况是线程正在做相同的事情，或者除了一个线程之外所有线程都退出了。这有助于我控制要处理的共享数据。

2。锁定

Python随附的线程模块包含一个易于实现的locking mechanism，可让您同步线程。通过调用Lock()方法来创建新锁，该方法将返回新锁。

原始锁处于“锁定”或“解锁”两种状态之一。它在解锁状态下创建。它有两种基本方法，acquire（）和release（）。解锁状态后，acquire（）会更改状态锁定并立即返回。状态锁定时，acquire（）阻塞，直到另一个线程中对release（）的调用将其更改为解锁后，acquire（）调用将其重置为锁定状态并返回。的 release（）方法只能在锁定状态下调用；它改变解锁状态并立即返回。如果尝试要释放解锁的锁，将引发ThreadError。

对于更多的人类语言，锁是线程模块提供的最基本的同步机制。任何时候，锁都可以由单个线程持有，也可以根本没有线程持有。如果某个线程尝试持有某个其他线程已经持有的锁，则第一个线程的执行将暂停，直到释放该锁为止。

锁通常用于同步对共享资源的访问。对于每个共享资源，创建一个Lock对象。当您需要访问资源时，请调用获取来持有该锁（如果需要，它将等待释放锁），然后调用release来释放它。

3。线程

要使用线程模块实现新线程，您必须执行以下操作：

定义Thread类的新子类。
重写 init （self [，args]）方法以添加其他参数。
然后，重写run（self [，args]）方法以实现线程在启动时应执行的操作。

一旦创建了新的Thread子类，就可以创建它的实例，然后通过调用start（）来启动新线程，该start（）依次调用run（）方法。方法：

run（）-方法是线程的入口点。
start（）-方法通过调用run方法来启动线程。
join（[time]）-等待线程终止。
isAlive（）-方法检查线程是否仍在执行。
getName（）-返回线程的名称。
setName（）-设置线程的名称。

真的更快吗？

使用单线程：

$ time python single.py 
Processing request url: https://api.github.com/events
<Response [200]>
Processing request url: http://api.plos.org/search?q=title:THREAD
<Response [200]>
Processing request url: http://api.plos.org/search?q=title:DNA
<Response [200]>
Processing request url: http://api.plos.org/search?q=title:PYTHON
<Response [200]>
Processing request url: http://api.plos.org/search?q=title:JAVA
<Response [200]>
Exiting Main Thread

real    0m22.310s
user    0m0.096s
sys 0m0.022s

使用3个线程：

Starting Thread-1
Starting Thread-2
Starting Thread-3
Thread-3 processing https://api.github.com/events
Thread-1 processing http://api.plos.org/search?q=title:THREAD
Thread-2 processing http://api.plos.org/search?q=title:DNA
<Response [200]>
<Response [200]>
<Response [200]>
Thread-1 processing http://api.plos.org/search?q=title:PYTHON
Thread-2 processing http://api.plos.org/search?q=title:JAVA
Exiting Thread-3
<Response [200]>
<Response [200]>
Exiting Thread-1
 Exiting Thread-2
Exiting Main Thread

real    0m11.726s
user    0m6.692s
sys 0m0.028s

使用5个线程：

time python multi.py 
Starting Thread-1
Starting Thread-2
Starting Thread-3
 Starting Thread-4
Starting Thread-5
Thread-5 processing https://api.github.com/events
Thread-1 processing http://api.plos.org/search?q=title:THREAD
Thread-2 processing http://api.plos.org/search?q=title:DNA
Thread-3 processing http://api.plos.org/search?q=title:PYTHONThread-4 processing http://api.plos.org/search?q=title:JAVA

<Response [200]>
<Response [200]>
 <Response [200]>
<Response [200]>
<Response [200]>
Exiting Thread-5
Exiting Thread-4
Exiting Thread-2
Exiting Thread-3
Exiting Thread-1
Exiting Main Thread

real    0m6.446s
user    0m1.104s
sys 0m0.029s

5个线程快将近4倍。这些只是5个虚拟请求。想象有更大的数据量。

请注意：我仅在python 2.7下进行过测试。对于python 3.x，可能需要进行一些细微调整。

每秒增加请求量

2 个答案:

真的更快吗？