我希望向API端点发送100K-300K POST请求 - 这些请求来自我正在迭代的JSON对象列表。不幸的是,我能够使用的最大块大小是一次10个事件,这大大降低了发送我想要的所有事件的速度。在我定义了JSON对象列表之后:
chunkSize= 10
for i in xrange(0, len(list_of_JSON), chunkSize):
chunk = list_of_JSON[i:i+chunkSize] #10
endpoint = ""
d_profile = "[" + ",".join(json.dumps(x) for x in chunk) + "]"
str_event = d_profile
try:
url = base_api + endpoint + "?api_key=" + api_key + "&event=" + str_event
r = requests.post(url)
print r.content
print i
except:
print 'failed'
此过程非常缓慢地发送事件。我已经查明了多线程/并发/并行处理的可能性,尽管我对这个主题完全不熟悉。经过一番研究,我想出了这个丑陋的snippit:
import logging
import threading
import time
logging.basicConfig(level=logging.DEBUG,
format='[%(levelname)s] (%(threadName)-10s) %(message)s',
)
def worker():
logging.debug('Starting')
import time
chunkSize= 10
for i in xrange(0, (len(list_of_JSON)/2), chunkSize):
chunk = list_of_JSON[i:i+chunkSize] #10
endpoint = ""
d = "[" + ",".join(json.dumps(x) for x in chunk) + "]"
str_event = d
try:
url = base_api + endpoint + "?api_key=" + api_key + "&event=" + str_event
r = requests.post(url)
print r.content
print i
except:
print 'failed'
time.sleep(2)
logging.debug('Exiting')
def my_service():
logging.debug('Starting')
import time
chunkSize= 10
for i in xrange(((len(list_of_JSON)/2)+1), len(list_of_JSON), chunkSize):
chunk = list_of_JSON[i:i+chunkSize] #10
endpoint = ""
d = "[" + ",".join(json.dumps(x) for x in chunk) + "]"
str_event = d
try:
url = base_api + endpoint + "?api_key=" + api_key + "&event=" + str_event
r = requests.post(url)
print r.content
print i
except:
print 'failed'
time.sleep(3)
logging.debug('Exiting')
t = threading.Thread(target=my_service)
w = threading.Thread(target=worker)
w.start()
t.start()
感谢任何建议或重构。
编辑:我相信我的实现完成了我想要的。我查看了What is the fastest way to send 100,000 HTTP requests in Python?,但我仍然不确定这个解决方案是如何pythonic或有效的。
答案 0 :(得分:-1)
你可以使用scrapy,它使用twisted(如评论中所示)。 Scrapy是一个用于抓取网页的框架,但您也可以使用它来发送post
个请求。与代码实现相同的蜘蛛看起来或多或少会像这样:
class EventUploader(scrapy.Spider):
BASE_URL = 'http://stackoverflow.com/' # Example url
def start_requests(self):
for chunk in list_of_JSON:
get_parameters = {
'api_key': api_key,
'event': json.dumps(chunk), # json.dumps can encode lists too
}
url = "{}/endpoint?{}".format(
self.BASE_URL, urlencode(get_parameters))
yield scrapy.FormRequest(url, formdata={}, callback=self.next)
def next(self, response):
# here you can assert everything went ok
pass
一旦你的蜘蛛到位,你可以使用scrapy中间件来限制你的请求。你可以像这样运行你的上传器:
scrapy runspider my_spider.py