问题是stdin不支持avro所需的搜索,所以我们读取所有缓冲区,然后将其提供给avro_wrapper。它适用于Python 2,但在Python 3中不起作用。我尝试了一些解决方案,但它们都没有工作。
# stdin doesn't support seek which is needed by avro... so this hack worked in python 2. This does not work in Python 3.
# Reading everything to buffer and then giving this to avro_wrapper.
buf = StringIO()
buf.write(args.input_file.read())
r = DataFileReader(buf, DatumReader())
# Very first record the headers information. Which gives the header names in order along with munge header names for all the record types
# For e.g if we have 2 ports then it will hold the header information of
# 1. port1 on name1 key
# 2. port2 on name2 key and so on
headers_record = next(r)['headers']
以上产生UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf0 in position 17: invalid continuation byte
错误。
然后我们尝试这样做:
input_stream = io.TextIOWrapper(args.input_file.buffer, encoding='latin-1')
sio = io.StringIO(input_stream.read())
r = DataFileReader(sio, DatumReader())
headers_record = next(r)['headers']
这会产生avro.schema.AvroException: Not an Avro data file: Obj doesn't match b'Obj\x01'.
错误。
另一种方式:
input_stream = io.TextIOWrapper(args.input_file.buffer, encoding='latin-1')
buf = io.BytesIO(input_stream.read().encode('latin-1'))
r = DataFileReader(buf.read(), DatumReader())
headers_record = next(r)['headers']
这会产生AttributeError: 'bytes' object has no attribute 'seek'" error.
答案 0 :(得分:0)
io.BytesIO()
是用于创建包含二进制数据的可搜索内存文件对象的正确类型。
但是,您错误地从bytes
文件对象中读取io.BytesIO()
数据,并将其传递给而不是实际的文件对象。
不要阅读,使用从io.BytesIO
读取的二进制数据传入实际的stdin
文件对象:
buf = io.BytesIO(args.input_file.buffer.read())
r = DataFileReader(buf, DatumReader())
我直接传递了args.input_file.buffer
数据,假设args.input
是解析stdin字节的TextIOWrapper
实例,而.buffer
是基础BufferedReader
提供原始二进制数据的实例。将此数据解码为Latin-1,然后再次编码为Latin-1没有意义。只需传递字节。