我们有一个工作队列,工人一次处理这些工作。每个作业都要求我们格式化一些数据并发出HTTP POST请求,并将数据作为请求有效负载。
我们如何让每个工作人员以单线程,非阻塞方式异步发出这些HTTP POST请求?我们不关心请求的响应 - 我们想要的只是请求尽快执行,然后让工作人员立即进入下一个工作。
我们已探索过使用gevent
和grequests
库(请参阅Why does gevent.spawn not execute the parameterized function until a call to Greenlet.join?)。我们的工作代码看起来像这样:
def execute_task(worker, job):
print "About to spawn request"
greenlet = gevent.spawn(requests.post, url, params=params)
print "Request spawned, about to call sleep"
gevent.sleep()
print "Greenlet status: ", greenlet.ready()
第一个print语句执行,但第二个和第三个print语句永远不会打印,并且永远不会命中url。
我们如何才能执行这些异步请求?
答案 0 :(得分:1)
1)制作一个Queue.Queue对象
2)根据你的喜好制作尽可能多的“工人”线程,并从Queue.Queue中读取
3)将作业提供给Queue.Queue
工作线程将按照它们放在它上面的顺序读取Queue.Queue
从文件中读取行并将它们放入Queue.Queue
的示例import sys
import urllib2
import urllib
from Queue import Queue
import threading
import re
THEEND = "TERMINATION-NOW-THE-END"
#read from file into Queue.Queue asynchronously
class QueueFile(threading.Thread):
def run(self):
if not(isinstance(self.myq, Queue)):
print "Queue not set to a Queue"
sys.exit(1)
h = open(self.f, 'r')
for l in h:
self.myq.put(l.strip()) # this will block if the queue is full
self.myq.put(THEEND)
def set_queue(self, q):
self.myq = q
def set_file(self, f):
self.f = f
了解工作线程可能是什么样的(仅限示例)
class myWorker(threading.Thread):
def run(self):
while(running):
try:
data = self.q.get() # read from fifo
req = urllib2.Request("http://192.168.1.10/url/path")
req.add_data(urllib.urlencode(data))
h1 = urllib2.urlopen(req, timeout=10)
res = h1.read()
assert(len(res) > 80)
except urllib2.HTTPError, e:
print e
except urllib2.URLError, e:
print "done %d reqs " % n
print e
sys.exit()
基于threading制作对象。转到go,创建对象然后在实例上调用“start”
答案 1 :(得分:1)
您必须在不同的线程中运行它或使用内置的asyncore库。 大多数库都会在你不知道的情况下进行线程化,或者它将依赖于作为Python标准部分的asyncore。
这是Threading和asyncore的组合:
#!/usr/bin/python
# -*- coding: iso-8859-15 -*-
import asyncore, socket
from threading import *
from time import sleep
from os import _exit
from logger import * # <- Non-standard library containing a log function
from config import * # <- Non-standard library containing settings such as "server"
class logDispatcher(Thread, asyncore.dispatcher):
def __init__(self, config=None):
self.inbuffer = ''
self.buffer = ''
self.lockedbuffer = False
self.is_writable = False
self.is_connected = False
self.exit = False
self.initated = False
asyncore.dispatcher.__init__(self)
Thread.__init__(self)
self.create_socket(socket.AF_INET, socket.SOCK_STREAM)
try:
self.connect((server, server_port))
except:
log('Could not connect to ' + server, 'LOG_SOCK')
return None
self.start()
def handle_connect_event(self):
self.is_connected = True
def handle_connect(self):
self.is_connected = True
log('Connected to ' + str(server), 'LOG_SOCK')
def handle_close(self):
self.is_connected = False
self.close()
def handle_read(self):
data = self.recv(8192)
while self.lockedbuffer:
sleep(0.01)
self.inbuffer += data
def handle_write(self):
while self.is_writable:
sent = self.send(self.buffer)
sleep(1)
self.buffer = self.buffer[sent:]
if len(self.buffer) <= 0:
self.is_writable = False
sleep(0.01)
def _send(self, what):
self.buffer += what + '\r\n'
self.is_writable = True
def run(self):
self._send('GET / HTTP/1.1\r\n')
while 1:
logDispatcher() # <- Initate one for each request.
asyncore.loop(0.1)
log('All threads are done, next loop in 10', 'CORE')
sleep(10)
或者你可以简单地做一个完成工作的线程然后死掉。
from threading import *
class worker(Thread):
def __init__(self, host, postdata)
Thread.__init__(self)
self.host = host
self.postdata = postdata
self.start()
def run(self):
sock.send(self.postdata) #Pseudo, create the socket!
for data in postDataObjects:
worker('example.com', data)
如果你需要限制线程数量(如果你发送超过5k的帖子或者它可能会对系统造成负担),只需执行while len(enumerate()) > 1000: sleep(0.1)
并让looper对象等待一些线程消亡。
答案 2 :(得分:1)
您可能希望使用join
方法而不是sleep
,然后检查状态。如果你想一次执行一个将解决问题。稍微修改你的代码以测试它似乎工作正常。
import gevent
import requests
def execute_task(worker, job):
print "About to spawn request"
greenlet = gevent.spawn(requests.get, 'http://example.com', params={})
print "Request spawned, about to call sleep"
gevent.sleep()
print "Greenlet status: ", greenlet.ready()
print greenlet.get()
execute_task(None, None)
给出结果:
About to spawn request
Request spawned, about to call sleep
Greenlet status: True
<Response [200]>
这个Python进程中是否有更多可能会阻止Gevent运行此greenlet?
答案 3 :(得分:0)
将你的url和params包装在一个列表中,然后一次一对地弹出一对任务池(这里的任务池有一个任务或者是空的),创建线程,从任务池中读取任务,当一个线程获取任务并发送请求,然后从列表中弹出另一个(即这实际上是一个队列列表)