我有一个生成字符串列表的生成器。 Python中是否有一个实用程序/适配器可以使它看起来像文件?
例如,
>>> def str_fn():
... for c in 'a', 'b', 'c':
... yield c * 3
...
>>> for s in str_fn():
... print s
...
aaa
bbb
ccc
>>> stream = some_magic_adaptor(str_fn())
>>> while True:
... data = stream.read(4)
... if not data:
... break
... print data
aaab
bbcc
c
因为数据可能很大并且需要可流传输(每个片段都是几千字节,整个流是几十兆字节),我不想在将整个生成器传递给流适配器之前急切地评估整个生成器。
答案 0 :(得分:11)
这是一个应该从块中读取迭代器的解决方案。
class some_magic_adaptor:
def __init__( self, it ):
self.it = it
self.next_chunk = ""
def growChunk( self ):
self.next_chunk = self.next_chunk + self.it.next()
def read( self, n ):
if self.next_chunk == None:
return None
try:
while len(self.next_chunk)<n:
self.growChunk()
rv = self.next_chunk[:n]
self.next_chunk = self.next_chunk[n:]
return rv
except StopIteration:
rv = self.next_chunk
self.next_chunk = None
return rv
def str_fn():
for c in 'a', 'b', 'c':
yield c * 3
ff = some_magic_adaptor( str_fn() )
while True:
data = ff.read(4)
if not data:
break
print data
答案 1 :(得分:8)
执行此操作的“正确”方法是从标准Python io
抽象基类继承。但是,似乎Python不允许您提供原始文本类,并使用任何类型的缓冲读取器包装它。
继承的最佳班级是TextIOBase
。这是一个实现,处理readline
和read
同时注意性能。 (gist)
import io
class StringIteratorIO(io.TextIOBase):
def __init__(self, iter):
self._iter = iter
self._left = ''
def readable(self):
return True
def _read1(self, n=None):
while not self._left:
try:
self._left = next(self._iter)
except StopIteration:
break
ret = self._left[:n]
self._left = self._left[len(ret):]
return ret
def read(self, n=None):
l = []
if n is None or n < 0:
while True:
m = self._read1()
if not m:
break
l.append(m)
else:
while n > 0:
m = self._read1(n)
if not m:
break
n -= len(m)
l.append(m)
return ''.join(l)
def readline(self):
l = []
while True:
i = self._left.find('\n')
if i == -1:
l.append(self._left)
try:
self._left = next(self._iter)
except StopIteration:
self._left = ''
break
else:
l.append(self._left[:i+1])
self._left = self._left[i+1:]
break
return ''.join(l)
答案 2 :(得分:5)
StringIO的问题是您必须预先将所有内容加载到缓冲区中。如果生成器是无限的,这可能是一个问题:)
from itertools import chain, islice
class some_magic_adaptor(object):
def __init__(self, src):
self.src = chain.from_iterable(src)
def read(self, n):
return "".join(islice(self.src, None, n))
答案 3 :(得分:4)
有一个名为werkzeug.contrib.iterio.IterIO
,但请注意它将整个迭代器存储在其内存中(直到您将其作为文件读取),因此可能不合适。
http://werkzeug.pocoo.org/docs/contrib/iterio/
来源:https://github.com/mitsuhiko/werkzeug/blob/master/werkzeug/contrib/iterio.py
readline
/ iter
上的一个漏洞:https://github.com/mitsuhiko/werkzeug/pull/500
答案 4 :(得分:4)
这是John和Matt的答案的修改版本,可以读取字符串的列表/生成器并输出bytearrays
ff = IterStringIO(c * 3 for c in ['a', 'b', 'c'])
while True:
data = ff.read(4)
if not data:
break
print data
aaab
bbcc
c
用法:
ff = IterStringIO()
ff.write('ddd')
ff.write(c * 3 for c in ['a', 'b', 'c'])
while True:
data = ff.read(4)
if not data:
break
print data
ddda
aabb
bccc
备用:
public DisplayPhotos()
{
InitializeComponent();
this.DataContext = this;
SystemTray.ProgressIndicator = new ProgressIndicator();
SystemTray.ProgressIndicator.Text = "Getting Photos";
DisplayPhotos.SetprogressIndicator(true);
Display(); //Not an async method.
DisplayPhotos.SetprogressIndicator(false);
}
private static void SetprogressIndicator(bool value)
{
SystemTray.ProgressIndicator.IsIndeterminate = value;
SystemTray.ProgressIndicator.IsVisible = value;
}
答案 5 :(得分:2)
看看Matt的回答,我可以看到并不总是需要实现所有的read方法。 read1
可能就足够了,其描述如下:
读取并返回 size 字节,最多只调用一次基础原始流的read()......
然后它可以用io.TextIOWrapper
包裹,例如,readline
的实现。boto.s3.key.Key
。作为一个例子,这里是来自S3(亚马逊简单存储服务)import io
import csv
from boto import s3
class StringIteratorIO(io.TextIOBase):
def __init__(self, iter):
self._iterator = iter
self._buffer = ''
def readable(self):
return True
def read1(self, n=None):
while not self._buffer:
try:
self._buffer = next(self._iterator)
except StopIteration:
break
result = self._buffer[:n]
self._buffer = self._buffer[len(result):]
return result
conn = s3.connect_to_region('some_aws_region')
bucket = conn.get_bucket('some_bucket')
key = bucket.get_key('some.csv')
fp = io.TextIOWrapper(StringIteratorIO(key))
reader = csv.DictReader(fp, delimiter = ';')
for row in reader:
print(row)
的CSV文件流,它实现了读取的迭代器。
io.RawIOBase
这是an answer相关问题,看起来好一点。它继承readinto
并覆盖IterStream
。在Python 3中它已经足够了,所以不是在io.BufferedReader
中包装io.TextIOWrapper
,而是可以将它包装在read1
中。在Python 2中需要readinto
,但可以通过const multer = require('multer'); const UPLOAD_PATH = 'uploads';
const upload = multer({ dest: `${UPLOAD_PATH}/` });
const sericesApi=(app)=> {
app.post('/api/upload', upload.single('avatar'), (req, res) => {
try {
res.send({'file':req.file});
} catch (err) {
res.sendStatus(400);
}
}); }
module.exports = sericesApi;
简单表达。
答案 6 :(得分:1)
这正是stringIO的用途..
>>> import StringIO
>>> some_var = StringIO.StringIO("Hello World!")
>>> some_var.read(4)
'Hell'
>>> some_var.read(4)
'o Wo'
>>> some_var.read(4)
'rld!'
>>>
或者如果你想做听起来像
的话Class MyString(StringIO.StringIO):
def __init__(self,*args):
StringIO.StringIO.__init__(self,"".join(args))
然后你可以简单地
xx = MyString(*list_of_strings)
答案 7 :(得分:-1)
首先,您的生成器必须生成字节对象。虽然没有内置任何内容,但您可以使用http://docs.python.org/library/stringio.html和itertools.chain的组合。