最近question about splitting a binary file using null characters让我想到了类似的面向文本的问题。
给出以下文件:
Parse me using spaces, please.
使用Perl 6,我可以使用空格(或任何选定的字符)作为输入换行符解析此文件,因此:
my $fh = open('spaced.txt', nl-in => ' ');
while $fh.get -> $line {
put $line;
}
或者更简洁:
.put for 'spaced.txt'.IO.lines(nl-in => ' ');
其中任何一个都给出以下结果:
Parse me using spaces, please.
Python 3中是否有相同的东西?
closest I could find需要将整个文件读入内存:
for line in f.read().split('\0'):
print line
更新:我发现其他几个较旧的问题和答案似乎表明这是不可用的,但我认为在过去的几年里这个领域可能有新的发展:
Python restrict newline characters for readlines()
Change newline character .readline() seeks
答案 0 :(得分:3)
没有内置支持来读取由自定义字符分割的文件。
但是,使用“U”-flag加载文件允许通用换行符,可以通过file.newlines获取。它将换行模式保留在整个文件中。
这是我的生成器来读取文件,而不是内存中的所有内容:
def customReadlines(fileNextBuff, char):
"""
\param fileNextBuff a function returning the next buffer or "" on EOF
\param char a string with the lines are splitted, the char is not included in the yielded elements
"""
lastLine = ""
lenChar = len(char)
while True:
thisLine = fileNextBuff
if not thisLine: break #EOF
fnd = thisLine.find(char)
while fnd != -1:
yield lastLine + thisLine[:fnd]
lastLine = ""
thisLine = thisLine[fnd+lenChar:]
fnd = thisLine.find(char)
lastLine+= thisLine
yield lastLine
### EXAMPLES ###
#open file.txt and print each part of the file ending with Null-terminator by loading a buffer of 256 characters
with open("file.bin", "r") as f:
for l in customReadlines((lambda: f.read(0x100)), "\0"):
print(l)
# open the file errors.log and split the file with a special string, while it loads a whole line at a time
with open("errors.log", "r") as f:
for l in customReadlines(f.readline, "ERROR:")
print(l)
print(" " + '-' * 78) # some seperator
答案 1 :(得分:1)
这个人会做你需要的吗?
def newreadline(f, newlinechar='\0'):
c = f.read(1)
b = [c]
while(c != newlinechar and c != ''):
c = f.read(1)
b.append(c)
return ''.join(b)
编辑:添加readlines()
def newreadlines(f, newlinechar='\0'):
line = newreadline(f, newlinechar)
while line:
yield line
line = newreadline(f, newlinechar)
以便OP可以执行以下操作:
for line in newreadlines(f, newlinechar='\0'):
print(line)
答案 2 :(得分:0)
def parse(fp, split_char, read_size=16):
def give_chunks():
while True:
stuff = fp.read(read_size)
if not stuff:
break
yield stuff
leftover = ''
for chunk in give_chunks():
*stuff, leftover = (leftover + chunk).split(split_char)
yield from stuff
if leftover:
yield leftover
如果您可以使用split_char分割新行,则可以使用以下单词(例如逐字阅读文本文件)
def parse(fobj, split_char):
for line in fobj:
yield from line.split(split_char)
In [5]: for word in parse(open('stuff.txt'), ' '):
...: print(word)