如果我有一个包含字符串的iterable,是否有一种简单的方法可以将其转换为流?我想做这样的事情:
def make_file():
yield "hello\n"
yield "world\n"
output = tarfile.TarFile(…)
stream = iterable_to_stream(make_file())
output.addfile(…, stream)
答案 0 :(得分:24)
Python 3有a new I/O stream API(library docs),取代了旧的类文件对象协议。 (新的API也可以在io
模块的Python 2中使用,并且它与文件类对象协议向后兼容。)
Here's an implementation for the new API,在Python 2和3中:
import io
def iterable_to_stream(iterable, buffer_size=io.DEFAULT_BUFFER_SIZE):
"""
Lets you use an iterable (e.g. a generator) that yields bytestrings as a read-only
input stream.
The stream implements Python 3's newer I/O API (available in Python 2's io module).
For efficiency, the stream is buffered.
"""
class IterStream(io.RawIOBase):
def __init__(self):
self.leftover = None
def readable(self):
return True
def readinto(self, b):
try:
l = len(b) # We're supposed to return at most this much
chunk = self.leftover or next(iterable)
output, self.leftover = chunk[:l], chunk[l:]
b[:len(output)] = output
return len(output)
except StopIteration:
return 0 # indicate EOF
return io.BufferedReader(IterStream(), buffer_size=buffer_size)
使用示例:
with iterable_to_stream(str(x**2).encode('utf8') for x in range(11)) as s:
print(s.read())
答案 1 :(得分:12)
由于它看起来不像是采用“标准”方式,因此我将一个简单的实施方式捆绑在一起:
class iter_to_stream(object):
def __init__(self, iterable):
self.buffered = ""
self.iter = iter(iterable)
def read(self, size):
result = ""
while size > 0:
data = self.buffered or next(self.iter, None)
self.buffered = ""
if data is None:
break
size -= len(data)
if size < 0:
data, self.buffered = data[:size], data[size:]
result += data
return result
答案 2 :(得分:12)
这是我的流式迭代器urllib3的实验分支,支持通过迭代的流式分块请求:
class IterStreamer(object):
"""
File-like streaming iterator.
"""
def __init__(self, generator):
self.generator = generator
self.iterator = iter(generator)
self.leftover = ''
def __len__(self):
return self.generator.__len__()
def __iter__(self):
return self.iterator
def next(self):
return self.iterator.next()
def read(self, size):
data = self.leftover
count = len(self.leftover)
if count < size:
try:
while count < size:
chunk = self.next()
data += chunk
count += len(chunk)
except StopIteration:
pass
self.leftover = data[size:]
return data[:size]
来源与背景: https://github.com/shazow/urllib3/blob/filepost-stream/urllib3/filepost.py#L23
相关单元测试: https://github.com/shazow/urllib3/blob/filepost-stream/test/test_filepost.py#L9
唉,这段代码还没有进入稳定的分支,但是由于没有大量的分块请求得不到支持,但它应该是你正在尝试做的事情的良好基础。有关如何使用它的示例,请参阅源链接。
答案 3 :(得分:4)
一个起点:
class iterable_to_stream:
def __init__(self, iterable):
self.iter = iter(iterable)
def read(self):
try:
return self.iter.next()
except StopIteration:
return ""
答案 4 :(得分:0)
TarFile会收集提供file-like interface的任何内容 - 因此您可以使用StringIO
(io.StringIO
如果您使用的是Python 3.X)来产生{{1}所需的内容或者你可以创建自己的类,提供file-like interface并产生你需要的东西。
答案 5 :(得分:0)
一个很好的Mechanical snail答案的修改版本。在这里,readinto(b)
实现对底层的迭代器进行了多次调用,以便为给定的可写字节状对象b
的大小收集尽可能多的字节。
class IteratorReader(io.RawIOBase):
def __init__(self, iterator):
self.iterator = iterator
self.leftover = []
def readinto(self, buffer: bytearray) -> Optional[int]:
size = len(buffer)
while len(self.leftover) < size:
try:
self.leftover.extend(next(self.iterator))
except StopIteration:
break
if len(self.leftover) == 0:
return 0
output, self.leftover = self.leftover[:size], self.leftover[size:]
buffer[:len(output)] = output
return len(output)
def readable(self) -> bool:
return True
和用法:
def iterator1():
for i in ('a', 'b', 'c', 'd', 'e', 'f', 'g'):
res = i * 3
yield res.encode("utf8")
iterreader = IteratorReader(iterator1())
while True:
r = iterreader.read(4)
if not r:
break
print(r)