如何将iterable转换为流?

时间:2011-07-11 23:18:21

标签: python stream iterator

如果我有一个包含字符串的iterable,是否有一种简单的方法可以将其转换为流?我想做这样的事情:

def make_file():
    yield "hello\n"
    yield "world\n"

output = tarfile.TarFile(…)
stream = iterable_to_stream(make_file())
output.addfile(…, stream)

6 个答案:

答案 0 :(得分:24)

Python 3有a new I/O stream APIlibrary docs),取代了旧的类文件对象协议。 (新的API也可以在io模块的Python 2中使用,并且它与文件类对象协议向后兼容。)

Here's an implementation for the new API,在Python 2和3中:

import io

def iterable_to_stream(iterable, buffer_size=io.DEFAULT_BUFFER_SIZE):
    """
    Lets you use an iterable (e.g. a generator) that yields bytestrings as a read-only
    input stream.

    The stream implements Python 3's newer I/O API (available in Python 2's io module).
    For efficiency, the stream is buffered.
    """
    class IterStream(io.RawIOBase):
        def __init__(self):
            self.leftover = None
        def readable(self):
            return True
        def readinto(self, b):
            try:
                l = len(b)  # We're supposed to return at most this much
                chunk = self.leftover or next(iterable)
                output, self.leftover = chunk[:l], chunk[l:]
                b[:len(output)] = output
                return len(output)
            except StopIteration:
                return 0    # indicate EOF
    return io.BufferedReader(IterStream(), buffer_size=buffer_size)

使用示例:

with iterable_to_stream(str(x**2).encode('utf8') for x in range(11)) as s:
    print(s.read())

答案 1 :(得分:12)

由于它看起来不像是采用“标准”方式,因此我将一个简单的实施方式捆绑在一起:

class iter_to_stream(object):
    def __init__(self, iterable):
        self.buffered = ""
        self.iter = iter(iterable)

    def read(self, size):
        result = ""
        while size > 0:
            data = self.buffered or next(self.iter, None)
            self.buffered = ""
            if data is None:
                break
            size -= len(data)
            if size < 0:
                data, self.buffered = data[:size], data[size:]
            result += data
        return result

答案 2 :(得分:12)

这是我的流式迭代器urllib3的实验分支,支持通过迭代的流式分块请求:

class IterStreamer(object):
    """
    File-like streaming iterator.
    """
    def __init__(self, generator):
        self.generator = generator
        self.iterator = iter(generator)
        self.leftover = ''

    def __len__(self):
        return self.generator.__len__()

    def __iter__(self):
        return self.iterator

    def next(self):
        return self.iterator.next()

    def read(self, size):
        data = self.leftover
        count = len(self.leftover)

        if count < size:
            try:
                while count < size:
                    chunk = self.next()
                    data += chunk
                    count += len(chunk)
            except StopIteration:
                pass

        self.leftover = data[size:]

        return data[:size]

来源与背景: https://github.com/shazow/urllib3/blob/filepost-stream/urllib3/filepost.py#L23

相关单元测试: https://github.com/shazow/urllib3/blob/filepost-stream/test/test_filepost.py#L9

唉,这段代码还没有进入稳定的分支,但是由于没有大量的分块请求得不到支持,但它应该是你正在尝试做的事情的良好基础。有关如何使用它的示例,请参阅源链接。

答案 3 :(得分:4)

一个起点:

class iterable_to_stream:
    def __init__(self, iterable):
        self.iter = iter(iterable)

    def read(self):
        try:
            return self.iter.next()
        except StopIteration:
            return ""

答案 4 :(得分:0)

TarFile会收集提供file-like interface的任何内容 - 因此您可以使用StringIOio.StringIO如果您使用的是Python 3.X)来产生{{1}所需的内容或者你可以创建自己的类,提供file-like interface并产生你需要的东西。

答案 5 :(得分:0)

一个很好的Mechanical snail答案的修改版本。在这里,readinto(b)实现对底层的迭代器进行了多次调用,以便为给定的可写字节状对象b的大小收集尽可能多的字节。

class IteratorReader(io.RawIOBase):

    def __init__(self, iterator):
        self.iterator = iterator
        self.leftover = []

    def readinto(self, buffer: bytearray) -> Optional[int]:
        size = len(buffer)
        while len(self.leftover) < size:
            try:
                self.leftover.extend(next(self.iterator))
            except StopIteration:
                break

        if len(self.leftover) == 0:
            return 0

        output, self.leftover = self.leftover[:size], self.leftover[size:]
        buffer[:len(output)] = output
        return len(output)

    def readable(self) -> bool:
        return True

和用法:

def iterator1():
    for i in ('a', 'b', 'c', 'd', 'e', 'f', 'g'):
        res = i * 3
        yield res.encode("utf8")


iterreader = IteratorReader(iterator1())
while True:
    r = iterreader.read(4)
    if not r:
        break
    print(r)