Python延迟加载

时间:2016-08-25 08:32:26

标签: python lazy-loading

以下代码将逐行延迟打印文本文件的内容,每个打印停止在'/ n'。

   with open('eggs.txt', 'rb') as file:
       for line in file:
           print line

是否有任何配置可以懒惰地打印文本文件的内容,每个打印都停在','?

(或任何其他字符/字符串)

我问这是因为我试图读取一个文件,其中包含一个用逗号分隔的2.9 GB长行。

PS。我的问题与此问题不同:Read large text files in Python, line by line without loading it in to memory 我问的是如何停止除换行符之外的字符('\ n')

5 个答案:

答案 0 :(得分:3)

我认为没有一种内置的方法来实现这一目标。您必须使用file.read(block_size)逐块读取文件,用逗号分隔每个块,然后手动重新加入跨越块边界的字符串。

请注意,如果您长时间不使用逗号,仍可能会耗尽内存。 (当遇到很长的行时,同样的问题也适用于逐行读取文件。)

以下是一个示例实现:

def split_file(file, sep=",", block_size=16384):
    last_fragment = ""
    while True:
        block = file.read(block_size)
        if not block:
            break
        block_fragments = iter(block.split(sep))
        last_fragment += next(block_fragments)
        for fragment in block_fragments:
            yield last_fragment
            last_fragment = fragment
    yield last_fragment

答案 1 :(得分:2)

使用文件缓冲读取(Python 3):

buffer_size = 2**12
delimiter = ','

with open(filename, 'r') as f:
    # remember the characters after the last delimiter in the previously processed chunk
    remaining = ""

    while True:
        # read the next chunk of characters from the file
        chunk = f.read(buffer_size)

        # end the loop if the end of the file has been reached
        if not chunk:
            break

        # add the remaining characters from the previous chunk,
        # split according to the delimiter, and keep the remaining
        # characters after the last delimiter separately
        *lines, remaining = (remaining + chunk).split(delimiter)

        # print the parts up to each delimiter one by one
        for line in lines:
            print(line, end=delimiter)

    # print the characters after the last delimiter in the file
    if remaining:
        print(remaining, end='')

请注意,这是当前编写的方式,它将完全按原样打印原始文件的内容。这很容易改变,例如,通过更改传递给循环中end=delimiter函数的print()参数。

答案 2 :(得分:1)

以下答案可以被认为是懒惰的,因为它一次只读取一个字符:

def commaBreak(filename):
    word = ""
    with open(filename) as f:
        while True:
            char = f.read(1)
            if not char:
                print "End of file"
                yield word
                break
            elif char == ',':
                yield word
                word = ""
            else:
                word += char

您可以选择使用更多数量的字符来执行此类操作,例如1000,一次阅读。

答案 3 :(得分:-1)

with open('eggs.txt', 'rb') as file:
for line in file:
    str_line = str(line)
    words = str_line.split(', ')
    for word in words:
        print(word)

我不完全确定我是否知道你在问什么,这是什么意思?

答案 4 :(得分:-1)

它立即从文件中生成每个字符,这意味着没有内存重载。

def lazy_read():
    try:
        with open('eggs.txt', 'rb') as file:
            item = file.read(1)
            while item:
                if ',' == item:
                    raise StopIteration
                yield item
                item = file.read(1)
    except StopIteration:
        pass

print ''.join(lazy_read())