据我了解,buffer_size
的{{1}}参数应该控制传递给基础读取器的读取缓冲区大小。
但是,我没有看到这种行为。相反,当我io.BufferedReader
整个文件时,将使用reader.read()
而忽略io.DEFAULT_BUFFER_SIZE
。当我buffer_size
时,reader.read(length)
用作缓冲区大小,并且length
参数再次被忽略。
最小示例:
buffer_size
输出:
import io
class MyReader(io.RawIOBase):
def __init__(self, length):
self.length = length
self.position = 0
def readinto(self, b):
print('read buffer length: %d' % len(b))
length = min(len(b), self.length - self.position)
self.position += length
b[:length] = 'a' * length
return length
def readable(self):
return True
def seekable(self):
return False
print('# read entire file')
reader = io.BufferedReader(MyReader(20000), buffer_size=100)
print('output length: %d' % len(reader.read()))
print('\n# read part of file file')
reader = io.BufferedReader(MyReader(20000), buffer_size=100)
print('output length: %d' % len(reader.read(10000)))
print('\n# read beyond end of file file')
reader = io.BufferedReader(MyReader(20000), buffer_size=100)
print 'output length: %d' % len(reader.read(30000))
我是否误解了BufferedReader应该如何工作?
答案 0 :(得分:1)
BufferedIOReader
的意义是保留一个内部缓冲区,然后设置该缓冲区的大小。该缓冲区用于满足较小的读取,以避免在较慢的I / O设备上进行许多读取调用。
但是,缓冲区不会尝试限制读取的大小!
来自io.BufferedIOReader
documentation:
从该对象读取数据时,可能会从底层原始流中请求大量数据,并将其保存在内部缓冲区中。然后可以在后续读取时直接返回缓冲的数据。
该对象继承自io.BufferedIOBase
,其声明:
与
RawIOBase
的主要区别在于方法read()
,readinto()
和write()
将尝试(分别)读取所需的输入或消耗所有给定的输出,但可能要进行多个系统调用。
因为您在对象上调用了.read()
,所以从包装的对象中读取了更大的块,以读取所有数据到最后。 BufferedIOReader()
实例拥有的内部缓冲区在这里没有发挥作用,您毕竟要求提供所有数据。
如果您阅读较小的块,缓冲区将起作用:
>>> reader = io.BufferedReader(MyReader(2048), buffer_size=512)
>>> __ = reader.read(42) # initial read, fill buffer
read buffer length: 512
>>> __ = reader.read(123) # within the buffer, no read to underlying file needed
>>> __ = reader.read(456) # deplete buffer, another read needed to re-fill
read buffer length: 512
>>> __ = reader.read(123) # within the buffer, no read to underlying file needed
>>> __ = reader.read() # read until end, uses larger blocks to read from wrapped file
read buffer length: 8192
read buffer length: 8192
read buffer length: 8192