Python-如何在io.BufferedReader中使用自定义buffer_size?

时间:2018-11-08 19:18:50

标签: python python-2.7 io

据我了解,buffer_size的{​​{1}}参数应该控制传递给基础读取器的读取缓冲区大小。

但是,我没有看到这种行为。相反,当我io.BufferedReader整个文件时,将使用reader.read()而忽略io.DEFAULT_BUFFER_SIZE。当我buffer_size时,reader.read(length)用作缓冲区大小,并且length参数再次被忽略。

最小示例:

buffer_size

输出:

import io

class MyReader(io.RawIOBase):

    def __init__(self, length):
        self.length = length
        self.position = 0

    def readinto(self, b):
        print('read buffer length: %d' % len(b))
        length = min(len(b), self.length - self.position)
        self.position += length
        b[:length] = 'a' * length
        return length

    def readable(self):
        return True

    def seekable(self):
        return False


print('# read entire file')
reader = io.BufferedReader(MyReader(20000), buffer_size=100)
print('output length: %d' % len(reader.read()))

print('\n# read part of file file')
reader = io.BufferedReader(MyReader(20000), buffer_size=100)
print('output length: %d' % len(reader.read(10000)))

print('\n# read beyond end of file file')
reader = io.BufferedReader(MyReader(20000), buffer_size=100)
print 'output length: %d' % len(reader.read(30000))

我是否误解了BufferedReader应该如何工作?

1 个答案:

答案 0 :(得分:1)

BufferedIOReader的意义是保留一个内部缓冲区,然后设置该缓冲区的大小。该缓冲区用于满足较小的读取,以避免在较慢的I / O设备上进行许多读取调用。

但是,缓冲区不会尝试限制读取的大小!

来自io.BufferedIOReader documentation

  

从该对象读取数据时,可能会从底层原始流中请求大量数据,并将其保存在内部缓冲区中。然后可以在后续读取时直接返回缓冲的数据。

该对象继承自io.BufferedIOBase,其声明:

  

RawIOBase的主要区别在于方法read()readinto()write()将尝试(分别)读取所需的输入或消耗所有给定的输出,但可能要进行多个系统调用。

因为您在对象上调用了.read(),所以从包装的对象中读取了更大的块,以读取所有数据到最后。 BufferedIOReader()实例拥有的内部缓冲区在这里没有发挥作用,您毕竟要求提供所有数据。

如果您阅读较小的块,缓冲区将起作用:

>>> reader = io.BufferedReader(MyReader(2048), buffer_size=512)
>>> __ = reader.read(42)  # initial read, fill buffer
read buffer length: 512
>>> __ = reader.read(123)  # within the buffer, no read to underlying file needed
>>> __ = reader.read(456)  # deplete buffer, another read needed to re-fill
read buffer length: 512
>>> __ = reader.read(123)  # within the buffer, no read to underlying file needed
>>> __ = reader.read()     # read until end, uses larger blocks to read from wrapped file
read buffer length: 8192
read buffer length: 8192
read buffer length: 8192