Question

最近question about splitting a binary file using null characters让我想到了类似的面向文本的问题。

给出以下文件：

Parse me using spaces, please.

使用Perl 6，我可以使用空格（或任何选定的字符）作为输入换行符解析此文件，因此：

my $fh = open('spaced.txt', nl-in => ' ');

while $fh.get -> $line {
    put $line;
}

或者更简洁：

.put for 'spaced.txt'.IO.lines(nl-in => ' ');

其中任何一个都给出以下结果：

Parse
me
using
spaces,
please.

Python 3中是否有相同的东西？

closest I could find需要将整个文件读入内存：

for line in f.read().split('\0'):
    print line

更新：我发现其他几个较旧的问题和答案似乎表明这是不可用的，但我认为在过去的几年里这个领域可能有新的发展： Python restrict newline characters for readlines()
Change newline character .readline() seeks

Answer 1

没有内置支持来读取由自定义字符分割的文件。

但是，使用“U”-flag加载文件允许通用换行符，可以通过file.newlines获取。它将换行模式保留在整个文件中。

这是我的生成器来读取文件，而不是内存中的所有内容：

def customReadlines(fileNextBuff, char):
    """
        \param fileNextBuff a function returning the next buffer or "" on EOF
        \param char a string with the lines are splitted, the char is not included in the yielded elements
    """
    lastLine = ""
    lenChar = len(char)
    while True:
         thisLine = fileNextBuff
         if not thisLine: break #EOF
         fnd = thisLine.find(char)
         while fnd != -1:
             yield lastLine + thisLine[:fnd]
             lastLine = ""
             thisLine = thisLine[fnd+lenChar:]
             fnd = thisLine.find(char)
         lastLine+= thisLine
    yield lastLine


### EXAMPLES ###

#open file.txt and print each part of the file ending with Null-terminator by loading a buffer of 256 characters
with open("file.bin", "r") as f:
    for l in customReadlines((lambda: f.read(0x100)), "\0"):
        print(l)

# open the file errors.log and split the file with a special string, while it loads a whole line at a time
with open("errors.log", "r") as f:
    for l in customReadlines(f.readline, "ERROR:")
        print(l)
        print(" " + '-' * 78) # some seperator

Answer 2

这个人会做你需要的吗？

def newreadline(f, newlinechar='\0'):
    c = f.read(1)
    b = [c]
    while(c != newlinechar and c != ''):
        c = f.read(1)
        b.append(c)
    return ''.join(b)

编辑：添加readlines()

的替代品

def newreadlines(f, newlinechar='\0'):
    line = newreadline(f, newlinechar)
    while line:
        yield line
        line = newreadline(f, newlinechar)

以便OP可以执行以下操作：

for line in newreadlines(f, newlinechar='\0'):
    print(line)

Answer 3

def parse(fp, split_char, read_size=16):
    def give_chunks():
        while True:
            stuff = fp.read(read_size)
            if not stuff:
                break
            yield stuff
    leftover = ''
    for chunk in give_chunks():
        *stuff, leftover =  (leftover + chunk).split(split_char)
        yield from stuff
    if leftover:
        yield leftover

如果您可以使用split_char分割新行，则可以使用以下单词（例如逐字阅读文本文件）

def parse(fobj, split_char):
    for line in fobj:
        yield from line.split(split_char)

In [5]: for word in parse(open('stuff.txt'), ' '):
   ...:     print(word)

在Python 3中从文件中读取行时，如何更改默认换行符？

3 个答案: