Question

我有一个流应用程序，该应用程序几乎连续地获取给定的数据作为输入，并使用该值发送HTTP请求，并对返回的值进行处理。

显然，为了加快速度，我已经使用Python 3.7中的asyncio和aiohttp库来获得最佳性能，但是鉴于数据移动的速度，它变得很难调试。

这是我的代码的样子

'''
Gets the final requests
'''
async def apiRequest(info, url, session, reqType, post_data=''):
    if reqType:
        async with session.post(url, data = post_data) as response:
            info['response'] = await response.text()
    else:
        async with session.get(url+post_data) as response:
            info['response'] =  await response.text()
    logger.debug(info)
    return info

'''
Loops through the batches and sends it for request
'''
async def main(data, listOfData):
    tasks = []
    async with ClientSession() as session:
        for reqData in listOfData:
            try:
                task = asyncio.ensure_future(apiRequest(**reqData))
                tasks.append(task)
            except Exception as e:
                print(e)
                exc_type, exc_obj, exc_tb = sys.exc_info()
                fname = os.path.split(exc_tb.tb_frame.f_code.co_filename)[1]
                print(exc_type, fname, exc_tb.tb_lineno)
        responses = await asyncio.gather(*tasks)
    return responses #list of APIResponses

'''
Streams data in and prepares batches to send for requests
'''
async def Kconsumer(data, loop, batchsize=100):
        consumer = AIOKafkaConsumer(**KafkaConfigs)
        await consumer.start()
        dataPoints = []
        async for msg in consumer:
            try:
                sys.stdout.flush()
                consumedMsg = loads(msg.value.decode('utf-8'))
                if consumedMsg['tid']:
                    dataPoints.append(loads(msg.value.decode('utf-8')))
                if len(dataPoints)==batchsize or time.time() - startTime>5:
                    '''
                    #1: The task below goes and sends HTTP GET requests in bulk using aiohttp
                    '''
                    task = asyncio.ensure_future(getRequests(data, dataPoints))
                    res = await asyncio.gather(*[task])
                    if task.done():
                        outputs = []
                        '''
                        #2: Does some ETL on the returned values
                        '''
                        ids = await asyncio.gather(*[doSomething(**{'tid':x['tid'],
                                                'cid':x['cid'], 'tn':x['tn'],
                                                'id':x['id'], 'ix':x['ix'],
                                                'ac':x['ac'], 'output':to_dict(xmltodict.parse(x['response'],encoding='utf-8')),
                                                'loop':loop, 'option':1}) for x in res[0]])
                        simplySaveDataIntoDataBase(id) # This is where I see some missing data in the database
                    dataPoints = []
            except Exception as e:
                    logger.error(e)
                    logger.error(traceback.format_exc())
                    exc_type, exc_obj, exc_tb = sys.exc_info()
                    fname = os.path.split(exc_tb.tb_frame.f_code.co_filename)[1]
                    logger.error(str(exc_type) +' '+ str(fname) +' '+ str(exc_tb.tb_lineno))


if __name__ == '__main__':
    loop = asyncio.get_event_loop()
    asyncio.ensure_future(Kconsumer(data, loop, batchsize=100))
    loop.run_forever()

是否需要await版本的sure_future？ aiohttp如何处理比其他请求晚一点的请求？它不应该保留整个批次而不是忘记它吗？

Answer 1

是否需要url<-' https://oceandata.sci.gsfc.nasa.gov/MODIS-Aqua/Mapped/Monthly/4km/sst/' f1<-getURL(url, curl = curl) download.file('https://oceandata.sci.gsfc.nasa.gov/cgi/getfile/A20021822002212.L3m_MO_SST_sst_4km.nc', destfile = desf[length(f2)], mode = "wb") ensure_future？

是的，您的代码已经这样做了。 await等待所提供的任务并按相同顺序返回其结果。

请注意，await asyncio.gather(*tasks)没有意义，因为它等效于await asyncio.gather(*[task])，而后者又等效于await asyncio.gather(task)。换句话说，当您需要await task的结果时，可以编写getRequests(data, dataPoints)，而无需先调用res = await getRequests(data, dataPoints)然后再调用ensure_future()的仪式。

实际上，您几乎不需要自己打gather()：

如果您需要等待多个任务，则可以将协程对象直接传递给ensure_future，例如gather。
如果您需要生成后台任务，可以致电gather(coroutine1(), coroutine2())

aiohttp如何处理比其他请求晚一点的请求？它不应该保留整个批次而不是忘记它吗？

如果您使用asyncio.create_task(coroutine(...))，则所有请求都必须完成才能返回。（这不是aiohttp策略，而是gather的工作方式。）如果需要实施超时，则可以使用gather或类似的方法。

处理sure_future及其缺少的任务

1 个答案: