调整迭代器的行为就像Python中的类文件对象一样

时间:2012-09-26 02:04:12

标签: python

我有一个生成字符串列表的生成器。 Python中是否有一个实用程序/适配器可以使它看起来像文件?

例如,

>>> def str_fn():
...     for c in 'a', 'b', 'c':
...         yield c * 3
... 
>>> for s in str_fn():
...     print s
... 
aaa
bbb
ccc
>>> stream = some_magic_adaptor(str_fn())
>>> while True:
...    data = stream.read(4)
...    if not data:
...        break
...    print data
aaab
bbcc
c

因为数据可能很大并且需要可流传输(每个片段都是几千字节,整个流是几十兆字节),我不想在将整个生成器传递给流适配器之前急切地评估整个生成器。

8 个答案:

答案 0 :(得分:11)

这是一个应该从块中读取迭代器的解决方案。

class some_magic_adaptor:
  def __init__( self, it ):
    self.it = it
    self.next_chunk = ""
  def growChunk( self ):
    self.next_chunk = self.next_chunk + self.it.next()
  def read( self, n ):
    if self.next_chunk == None:
      return None
    try:
      while len(self.next_chunk)<n:
        self.growChunk()
      rv = self.next_chunk[:n]
      self.next_chunk = self.next_chunk[n:]
      return rv
    except StopIteration:
      rv = self.next_chunk
      self.next_chunk = None
      return rv


def str_fn():
  for c in 'a', 'b', 'c':
    yield c * 3

ff = some_magic_adaptor( str_fn() )

while True:
  data = ff.read(4)
  if not data:
    break
  print data

答案 1 :(得分:8)

执行此操作的“正确”方法是从标准Python io抽象基类继承。但是,似乎Python不允许您提供原始文本类,并使用任何类型的缓冲读取器包装它。

继承的最佳班级是TextIOBase。这是一个实现,处理readlineread同时注意性能。 (gist

import io

class StringIteratorIO(io.TextIOBase):

    def __init__(self, iter):
        self._iter = iter
        self._left = ''

    def readable(self):
        return True

    def _read1(self, n=None):
        while not self._left:
            try:
                self._left = next(self._iter)
            except StopIteration:
                break
        ret = self._left[:n]
        self._left = self._left[len(ret):]
        return ret

    def read(self, n=None):
        l = []
        if n is None or n < 0:
            while True:
                m = self._read1()
                if not m:
                    break
                l.append(m)
        else:
            while n > 0:
                m = self._read1(n)
                if not m:
                    break
                n -= len(m)
                l.append(m)
        return ''.join(l)

    def readline(self):
        l = []
        while True:
            i = self._left.find('\n')
            if i == -1:
                l.append(self._left)
                try:
                    self._left = next(self._iter)
                except StopIteration:
                    self._left = ''
                    break
            else:
                l.append(self._left[:i+1])
                self._left = self._left[i+1:]
                break
        return ''.join(l)

答案 2 :(得分:5)

StringIO的问题是您必须预先将所有内容加载到缓冲区中。如果生成器是无限的,这可能是一个问题:)

from itertools import chain, islice
class some_magic_adaptor(object):
    def __init__(self, src):
        self.src = chain.from_iterable(src)
    def read(self, n):
        return "".join(islice(self.src, None, n))

答案 3 :(得分:4)

有一个名为werkzeug.contrib.iterio.IterIO,但请注意它将整个迭代器存储在其内存中(直到您将其作为文件读取),因此可能不合适。

http://werkzeug.pocoo.org/docs/contrib/iterio/

来源:https://github.com/mitsuhiko/werkzeug/blob/master/werkzeug/contrib/iterio.py

readline / iter上的一个漏洞:https://github.com/mitsuhiko/werkzeug/pull/500

答案 4 :(得分:4)

这是John和Matt的答案的修改版本,可以读取字符串的列表/生成器并输出bytearrays

ff = IterStringIO(c * 3 for c in ['a', 'b', 'c'])

while True:
    data = ff.read(4)

    if not data:
        break

    print data

aaab
bbcc
c

用法:

ff = IterStringIO()
ff.write('ddd')
ff.write(c * 3 for c in ['a', 'b', 'c'])

while True:
    data = ff.read(4)

    if not data:
        break

    print data

ddda
aabb
bccc

备用:

public DisplayPhotos()
{
    InitializeComponent();
    this.DataContext = this;
    SystemTray.ProgressIndicator = new ProgressIndicator();
    SystemTray.ProgressIndicator.Text = "Getting Photos";
    DisplayPhotos.SetprogressIndicator(true);
    Display(); //Not an async method.
    DisplayPhotos.SetprogressIndicator(false);  
}

private static void SetprogressIndicator(bool value)
{
    SystemTray.ProgressIndicator.IsIndeterminate = value;
    SystemTray.ProgressIndicator.IsVisible = value;
}

答案 5 :(得分:2)

看看Matt的回答,我可以看到并不总是需要实现所有的read方法。 read1可能就足够了,其描述如下:

  

读取并返回 size 字节,最多只调用一次基础原始流的read()......

然后它可以用io.TextIOWrapper包裹,例如,readline的实现。boto.s3.key.Key。作为一个例子,这里是来自S3(亚马逊简单存储服务)import io import csv from boto import s3 class StringIteratorIO(io.TextIOBase): def __init__(self, iter): self._iterator = iter self._buffer = '' def readable(self): return True def read1(self, n=None): while not self._buffer: try: self._buffer = next(self._iterator) except StopIteration: break result = self._buffer[:n] self._buffer = self._buffer[len(result):] return result conn = s3.connect_to_region('some_aws_region') bucket = conn.get_bucket('some_bucket') key = bucket.get_key('some.csv') fp = io.TextIOWrapper(StringIteratorIO(key)) reader = csv.DictReader(fp, delimiter = ';') for row in reader: print(row) 的CSV文件流,它实现了读取的迭代器。

io.RawIOBase

更新

这是an answer相关问题,看起来好一点。它继承readinto并覆盖IterStream。在Python 3中它已经足够了,所以不是在io.BufferedReader中包装io.TextIOWrapper,而是可以将它包装在read1中。在Python 2中需要readinto,但可以通过const multer = require('multer'); const UPLOAD_PATH = 'uploads'; const upload = multer({ dest: `${UPLOAD_PATH}/` }); const sericesApi=(app)=> { app.post('/api/upload', upload.single('avatar'), (req, res) => { try { res.send({'file':req.file}); } catch (err) { res.sendStatus(400); } }); } module.exports = sericesApi; 简单表达。

答案 6 :(得分:1)

这正是stringIO的用途..

>>> import StringIO
>>> some_var = StringIO.StringIO("Hello World!")
>>> some_var.read(4)
'Hell'
>>> some_var.read(4)
'o Wo'
>>> some_var.read(4)
'rld!'
>>>

或者如果你想做听起来像

的话
Class MyString(StringIO.StringIO):
     def __init__(self,*args):
         StringIO.StringIO.__init__(self,"".join(args))

然后你可以简单地

xx = MyString(*list_of_strings)

答案 7 :(得分:-1)

首先,您的生成器必须生成字节对象。虽然没有内置任何内容,但您可以使用http://docs.python.org/library/stringio.html和itertools.chain的组合。