Hej all,
我有一个烧瓶restplus服务器到位,我正在努力实现将excel作为八位字节流传送给我的客户。在序列化大型DataFrame时,似乎pandas.to_excel(..)消耗了大量时间(大约120k行30秒)。
请参阅下面我目前的实施情况:
def format(data_frame):
# Idea is to directly write to the flask output stream, instead of buffering
# the whole excel as io.BytesIO. Is there a way to do it?
output = io.BytesIO()
writer = pandas.ExcelWriter(output, engine='xlsxwriter')
data_frame_ordered = data_frame.reindex_axis(sorted(data_frame.columns), axis=1)
# This consumes a lot of time
data_frame_ordered.to_excel(writer, sheet_name='ML Data', na_rep=0, index=False, encoding='utf-8')
# This consumes a lot of time, too.
writer.save()
return output.getvalue()
@api.route('/excel', methods=['GET'])
class ExcelResource(Resource):
def get(self, args):
# Well, thats a huge pandas.DataFrame
data_frame = ...
resp = make_response(format(data_frame))
resp.headers['Content-Length'] = resp.content_length
resp.headers['Content-Type'] = 'application/octet-stream'
return resp
有没有办法将excel直接写入烧瓶输出流,而无需将其缓冲到BytesIO实例中?
提前致谢
丹尼斯
答案 0 :(得分:1)
您可以尝试创建一些类似文件的对象,为您提供流式界面,例如:
import threading
from flask import Response
from Queue import Queue
class StreamWriter(object):
def __init__(self):
self.queue = Queue()
def write(self, some):
self.queue.put(some)
def read(self):
return self.queue.get(True)
def flush(self):
pass
def tell(self):
#probably some code
pass
def seek(self):
#probably some code
pass
def close(self):
self.queue.put(None)
@api.route('/excel', methods=['GET'])
class ExcelResource(Resource):
def get(self, args):
def generate():
output = StreamWriter()
def do_stuff():
output = StreamWriter()
writer = pandas.ExcelWriter(output, engine='xlsxwriter')
data_frame_ordered = data_frame.reindex_axis(sorted(data_frame.columns), axis=1)
# This consumes a lot of time
data_frame_ordered.to_excel(writer, sheet_name='ML Data', na_rep=0, index=False, encoding='utf-8')
# This consumes a lot of time, too.
writer.save()
output.close()
threading.Thread(target=do_stuff).start()
while True:
chunk = output.read()
if chunk is None:
break
yield chunk
return Response(generate(), headers={some_headers})
这只是一个粗略的想法,这段代码未经测试!