Question

我希望向API端点发送100K-300K POST请求 - 这些请求来自我正在迭代的JSON对象列表。不幸的是，我能够使用的最大块大小是一次10个事件，这大大降低了发送我想要的所有事件的速度。在我定义了JSON对象列表之后：

chunkSize= 10
for i in xrange(0, len(list_of_JSON), chunkSize):
    chunk = list_of_JSON[i:i+chunkSize] #10
    endpoint = ""
    d_profile = "[" + ",".join(json.dumps(x) for x in chunk) + "]"
    str_event = d_profile
    try:
         url = base_api + endpoint + "?api_key=" + api_key + "&event=" + str_event 
         r = requests.post(url)
         print r.content
         print i
    except:
        print 'failed'

此过程非常缓慢地发送事件。我已经查明了多线程/并发/并行处理的可能性，尽管我对这个主题完全不熟悉。经过一番研究，我想出了这个丑陋的snippit：

import logging
import threading
import time

logging.basicConfig(level=logging.DEBUG,
                format='[%(levelname)s] (%(threadName)-10s) %(message)s',
                )

def worker():
    logging.debug('Starting')
    import time
    chunkSize= 10
    for i in xrange(0, (len(list_of_JSON)/2), chunkSize):
        chunk = list_of_JSON[i:i+chunkSize] #10
        endpoint = ""
        d = "[" + ",".join(json.dumps(x) for x in chunk) + "]"
        str_event = d
        try:
            url = base_api + endpoint + "?api_key=" + api_key + "&event=" + str_event 
            r = requests.post(url)
            print r.content
            print i
        except:
            print 'failed'
    time.sleep(2)
    logging.debug('Exiting')

def my_service():
    logging.debug('Starting')
    import time
    chunkSize= 10
    for i in xrange(((len(list_of_JSON)/2)+1), len(list_of_JSON), chunkSize):
        chunk = list_of_JSON[i:i+chunkSize] #10
        endpoint = ""
        d = "[" + ",".join(json.dumps(x) for x in chunk) + "]"
        str_event = d
        try:
            url = base_api + endpoint + "?api_key=" + api_key + "&event=" + str_event 
            r = requests.post(url)
            print r.content
            print i
        except:
            print 'failed'
    time.sleep(3)
    logging.debug('Exiting')

t = threading.Thread(target=my_service)
w = threading.Thread(target=worker)


w.start()
t.start()

感谢任何建议或重构。

编辑：我相信我的实现完成了我想要的。我查看了What is the fastest way to send 100,000 HTTP requests in Python?，但我仍然不确定这个解决方案是如何pythonic或有效的。

Answer 1

你可以使用scrapy，它使用twisted（如评论中所示）。 Scrapy是一个用于抓取网页的框架，但您也可以使用它来发送post个请求。与代码实现相同的蜘蛛看起来或多或少会像这样：

class EventUploader(scrapy.Spider):
    BASE_URL = 'http://stackoverflow.com/'  # Example url

    def start_requests(self):
        for chunk in list_of_JSON:
            get_parameters = {
                'api_key': api_key,
                'event': json.dumps(chunk),  # json.dumps can encode lists too
            }

            url = "{}/endpoint?{}".format(
                self.BASE_URL, urlencode(get_parameters))
            yield scrapy.FormRequest(url, formdata={}, callback=self.next)

    def next(self, response):
        # here you can assert everything went ok
        pass

一旦你的蜘蛛到位，你可以使用scrapy中间件来限制你的请求。你可以像这样运行你的上传器：

scrapy runspider my_spider.py

同时发送HTTP请求

1 个答案: