我正在尝试增加每秒的请求量。我目前正在使用python 2.7,并且每秒能够接收大约1个请求。我是否需要对函数进行多线程/多处理或异步运行func的多个实例。我不知道如何使这项工作。请帮忙:-)
while True:
r = requests.post(url, allow_redirects=False, data={
str(formDataNameLogin): username,
str(formDataNamePass): password,
})
print 'Sending username: %s with password %s' % (username, password)
答案 0 :(得分:1)
只需使用任何异步库。我认为请求的异步版本,例如grequest,txrequests,requests-futures和requests-threads最适合您。在grequests自述文件的代码示例下面:
import grequests urls = [ 'http://www.heroku.com', 'http://python-tablib.org', 'http://httpbin.org', 'http://python-requests.org', 'http://fakedomain/', 'http://kennethreitz.com' ]
创建一组未发送的请求:
rs = (grequests.get(u) for u in urls)
同时发送所有内容:
grequests.map(rs)
使用或学习其他提到的模块,例如请求线程,可能会涉及更多的问题,尤其是在python 2中
from twisted.internet.defer import inlineCallbacks
from twisted.internet.task import react
from requests_threads import AsyncSession
session = AsyncSession(n=100)
@inlineCallbacks
def main(reactor):
responses = []
for i in range(100):
responses.append(session.get('http://httpbin.org/get'))
for response in responses:
r = yield response
print(r)
if __name__ == '__main__':
react(main)
asyncio和aiohttp可能更值得注意,但是,我想,学习已经熟悉的模块的版本会更容易。
多线程处理是不必要的,但是您可以尝试mutithreading,或者甚至更好地尝试多线程处理,然后看看哪种方法效果最好。
答案 1 :(得分:1)
您可以像使用多线程一样执行多个并行请求:
import Queue
import threading
import time
import requests
exit_flag = 0
class RequestThread(threading.Thread):
def __init__(self, thread_id, name, q):
threading.Thread.__init__(self)
self.thread_id = thread_id
self.name = name
self.q = q
def run(self):
print("Starting {0:s}".format(self.name))
process_data(self.name, self.q)
print("Exiting {0:s}".format(self.name))
def process_data(thread_name, q):
while not exit_flag:
queue_lock.acquire()
if not qork_queue.empty():
data = q.get()
queue_lock.release()
print("{0:s} processing {1:s}".format(thread_name, data))
response = requests.get(data)
print(response)
else:
queue_lock.release()
time.sleep(1)
thread_list = ["Thread-1", "Thread-2", "Thread-3"]
request_list = [
"https://api.github.com/events",
"http://api.plos.org/search?q=title:THREAD",
"http://api.plos.org/search?q=title:DNA",
"http://api.plos.org/search?q=title:PYTHON",
"http://api.plos.org/search?q=title:JAVA"
]
queue_lock = threading.Lock()
qork_queue = Queue.Queue(10)
threads = []
thread_id = 1
# Create new threads
for t_name in thread_list:
thread = RequestThread(thread_id, t_name, qork_queue)
thread.start()
threads.append(thread)
thread_id += 1
# Fill the queue
queue_lock.acquire()
for word in request_list:
qork_queue.put(word)
queue_lock.release()
# Wait for queue to empty
while not qork_queue.empty():
pass
# Notify threads it's time to exit
exit_flag = 1
# Wait for all threads to complete
for t in threads:
t.join()
print("Exiting Main Thread")
输出:
Starting Thread-1
Starting Thread-2
Starting Thread-3
Thread-1 processing https://api.github.com/events
Thread-2 processing http://api.plos.org/search?q=title:THREAD
Thread-3 processing http://api.plos.org/search?q=title:DNA
<Response [200]>
<Response [200]>
<Response [200]>
Thread-2 processing http://api.plos.org/search?q=title:PYTHON
Thread-3 processing http://api.plos.org/search?q=title:JAVA
Exiting Thread-1
<Response [200]>
<Response [200]>
Exiting Thread-3
Exiting Thread-2
Exiting Main Thread
尽管我不是多线程专家,但还是有一点解释:
1。队列
通过Queue模块,您可以创建一个可以容纳特定数量项目的新队列对象。有以下方法可以控制队列:
根据我对多线程的一点经验,这对于控制仍要处理的数据很有用。我遇到的情况是线程正在做相同的事情,或者除了一个线程之外所有线程都退出了。这有助于我控制要处理的共享数据。
2。锁定
Python随附的线程模块包含一个易于实现的locking mechanism,可让您同步线程。通过调用Lock()
方法来创建新锁,该方法将返回新锁。
原始锁处于“锁定”或“解锁”两种状态之一。它 在解锁状态下创建。它有两种基本方法,acquire() 和release()。解锁状态后,acquire()会更改状态 锁定并立即返回。状态锁定时,acquire() 阻塞,直到另一个线程中对release()的调用将其更改为 解锁后,acquire()调用将其重置为锁定状态并返回。的 release()方法只能在锁定状态下调用;它改变 解锁状态并立即返回。如果尝试 要释放解锁的锁,将引发ThreadError。
对于更多的人类语言,锁是线程模块提供的最基本的同步机制。任何时候,锁都可以由单个线程持有,也可以根本没有线程持有。如果某个线程尝试持有某个其他线程已经持有的锁,则第一个线程的执行将暂停,直到释放该锁为止。
锁通常用于同步对共享资源的访问。对于每个共享资源,创建一个Lock对象。当您需要访问资源时,请调用获取来持有该锁(如果需要,它将等待释放锁),然后调用release来释放它。
3。线程
要使用线程模块实现新线程,您必须执行以下操作:
一旦创建了新的Thread子类,就可以创建它的实例,然后通过调用start()来启动新线程,该start()依次调用run()方法。方法:
使用单线程:
$ time python single.py
Processing request url: https://api.github.com/events
<Response [200]>
Processing request url: http://api.plos.org/search?q=title:THREAD
<Response [200]>
Processing request url: http://api.plos.org/search?q=title:DNA
<Response [200]>
Processing request url: http://api.plos.org/search?q=title:PYTHON
<Response [200]>
Processing request url: http://api.plos.org/search?q=title:JAVA
<Response [200]>
Exiting Main Thread
real 0m22.310s
user 0m0.096s
sys 0m0.022s
使用3个线程:
Starting Thread-1
Starting Thread-2
Starting Thread-3
Thread-3 processing https://api.github.com/events
Thread-1 processing http://api.plos.org/search?q=title:THREAD
Thread-2 processing http://api.plos.org/search?q=title:DNA
<Response [200]>
<Response [200]>
<Response [200]>
Thread-1 processing http://api.plos.org/search?q=title:PYTHON
Thread-2 processing http://api.plos.org/search?q=title:JAVA
Exiting Thread-3
<Response [200]>
<Response [200]>
Exiting Thread-1
Exiting Thread-2
Exiting Main Thread
real 0m11.726s
user 0m6.692s
sys 0m0.028s
使用5个线程:
time python multi.py
Starting Thread-1
Starting Thread-2
Starting Thread-3
Starting Thread-4
Starting Thread-5
Thread-5 processing https://api.github.com/events
Thread-1 processing http://api.plos.org/search?q=title:THREAD
Thread-2 processing http://api.plos.org/search?q=title:DNA
Thread-3 processing http://api.plos.org/search?q=title:PYTHONThread-4 processing http://api.plos.org/search?q=title:JAVA
<Response [200]>
<Response [200]>
<Response [200]>
<Response [200]>
<Response [200]>
Exiting Thread-5
Exiting Thread-4
Exiting Thread-2
Exiting Thread-3
Exiting Thread-1
Exiting Main Thread
real 0m6.446s
user 0m1.104s
sys 0m0.029s
5个线程快将近4倍。这些只是5个虚拟请求。想象有更大的数据量。
请注意:我仅在python 2.7下进行过测试。对于python 3.x,可能需要进行一些细微调整。