以下代码将逐行延迟打印文本文件的内容,每个打印停止在'/ n'。
with open('eggs.txt', 'rb') as file:
for line in file:
print line
是否有任何配置可以懒惰地打印文本文件的内容,每个打印都停在','?
(或任何其他字符/字符串)
我问这是因为我试图读取一个文件,其中包含一个用逗号分隔的2.9 GB长行。
PS。我的问题与此问题不同:Read large text files in Python, line by line without loading it in to memory 我问的是如何停止除换行符之外的字符('\ n')
答案 0 :(得分:3)
我认为没有一种内置的方法来实现这一目标。您必须使用file.read(block_size)
逐块读取文件,用逗号分隔每个块,然后手动重新加入跨越块边界的字符串。
请注意,如果您长时间不使用逗号,仍可能会耗尽内存。 (当遇到很长的行时,同样的问题也适用于逐行读取文件。)
以下是一个示例实现:
def split_file(file, sep=",", block_size=16384):
last_fragment = ""
while True:
block = file.read(block_size)
if not block:
break
block_fragments = iter(block.split(sep))
last_fragment += next(block_fragments)
for fragment in block_fragments:
yield last_fragment
last_fragment = fragment
yield last_fragment
答案 1 :(得分:2)
使用文件缓冲读取(Python 3):
buffer_size = 2**12
delimiter = ','
with open(filename, 'r') as f:
# remember the characters after the last delimiter in the previously processed chunk
remaining = ""
while True:
# read the next chunk of characters from the file
chunk = f.read(buffer_size)
# end the loop if the end of the file has been reached
if not chunk:
break
# add the remaining characters from the previous chunk,
# split according to the delimiter, and keep the remaining
# characters after the last delimiter separately
*lines, remaining = (remaining + chunk).split(delimiter)
# print the parts up to each delimiter one by one
for line in lines:
print(line, end=delimiter)
# print the characters after the last delimiter in the file
if remaining:
print(remaining, end='')
请注意,这是当前编写的方式,它将完全按原样打印原始文件的内容。这很容易改变,例如,通过更改传递给循环中end=delimiter
函数的print()
参数。
答案 2 :(得分:1)
以下答案可以被认为是懒惰的,因为它一次只读取一个字符:
def commaBreak(filename):
word = ""
with open(filename) as f:
while True:
char = f.read(1)
if not char:
print "End of file"
yield word
break
elif char == ',':
yield word
word = ""
else:
word += char
您可以选择使用更多数量的字符来执行此类操作,例如1000,一次阅读。
答案 3 :(得分:-1)
with open('eggs.txt', 'rb') as file:
for line in file:
str_line = str(line)
words = str_line.split(', ')
for word in words:
print(word)
我不完全确定我是否知道你在问什么,这是什么意思?
答案 4 :(得分:-1)
它立即从文件中生成每个字符,这意味着没有内存重载。
def lazy_read():
try:
with open('eggs.txt', 'rb') as file:
item = file.read(1)
while item:
if ',' == item:
raise StopIteration
yield item
item = file.read(1)
except StopIteration:
pass
print ''.join(lazy_read())