Question

以下代码将逐行延迟打印文本文件的内容，每个打印停止在'/ n'。

   with open('eggs.txt', 'rb') as file:
       for line in file:
           print line

是否有任何配置可以懒惰地打印文本文件的内容，每个打印都停在'，'？

（或任何其他字符/字符串）

我问这是因为我试图读取一个文件，其中包含一个用逗号分隔的2.9 GB长行。

PS。我的问题与此问题不同：Read large text files in Python, line by line without loading it in to memory 我问的是如何停止除换行符之外的字符（'\ n'）

Answer 1

我认为没有一种内置的方法来实现这一目标。您必须使用file.read(block_size)逐块读取文件，用逗号分隔每个块，然后手动重新加入跨越块边界的字符串。

请注意，如果您长时间不使用逗号，仍可能会耗尽内存。（当遇到很长的行时，同样的问题也适用于逐行读取文件。）

以下是一个示例实现：

def split_file(file, sep=",", block_size=16384):
    last_fragment = ""
    while True:
        block = file.read(block_size)
        if not block:
            break
        block_fragments = iter(block.split(sep))
        last_fragment += next(block_fragments)
        for fragment in block_fragments:
            yield last_fragment
            last_fragment = fragment
    yield last_fragment

Answer 2

使用文件缓冲读取（Python 3）：

buffer_size = 2**12
delimiter = ','

with open(filename, 'r') as f:
    # remember the characters after the last delimiter in the previously processed chunk
    remaining = ""

    while True:
        # read the next chunk of characters from the file
        chunk = f.read(buffer_size)

        # end the loop if the end of the file has been reached
        if not chunk:
            break

        # add the remaining characters from the previous chunk,
        # split according to the delimiter, and keep the remaining
        # characters after the last delimiter separately
        *lines, remaining = (remaining + chunk).split(delimiter)

        # print the parts up to each delimiter one by one
        for line in lines:
            print(line, end=delimiter)

    # print the characters after the last delimiter in the file
    if remaining:
        print(remaining, end='')

请注意，这是当前编写的方式，它将完全按原样打印原始文件的内容。这很容易改变，例如，通过更改传递给循环中end=delimiter函数的print()参数。

Answer 3

以下答案可以被认为是懒惰的，因为它一次只读取一个字符：

def commaBreak(filename):
    word = ""
    with open(filename) as f:
        while True:
            char = f.read(1)
            if not char:
                print "End of file"
                yield word
                break
            elif char == ',':
                yield word
                word = ""
            else:
                word += char

您可以选择使用更多数量的字符来执行此类操作，例如1000，一次阅读。

Answer 4

with open('eggs.txt', 'rb') as file:
for line in file:
    str_line = str(line)
    words = str_line.split(', ')
    for word in words:
        print(word)

我不完全确定我是否知道你在问什么，这是什么意思？

Answer 5

它立即从文件中生成每个字符，这意味着没有内存重载。

def lazy_read():
    try:
        with open('eggs.txt', 'rb') as file:
            item = file.read(1)
            while item:
                if ',' == item:
                    raise StopIteration
                yield item
                item = file.read(1)
    except StopIteration:
        pass

print ''.join(lazy_read())

Python延迟加载

5 个答案: