Question

例如，如果我有一些结构非常简单的文本/日志文件，这里有几个不同的部分，结构不同，并用一些标记线分开，例如：

0x23499 0x234234 0x234234
...
0x34534 0x353454 0x345464
$$$NEW_SECTION$$$
4345-34534-345-345345-3453
3453-34534-346-766788-3534
...

那么，我如何通过这些部分读取文件？例如。在$$$NEW_SECTION$$$标记之前读取一个变量中的文件，然后读取它（不使用正则表达式等）。这是否有任何简单的解决方案？

Answer 1

这是没有将整个文件读入内存的解决方案：

 data1 = []
 pos = 0
 with open('data.txt', 'r') as f:
     line = f.readline()
     while line and not line.startswith('$$$'):
         data1.append(line)
         line = f.readline()

     pos = f.tell()

 data2 = []
 with open('data.txt', 'r') as f:
     f.seek(pos)
     for line in f:
         data2.append(line)

 print data1
 print data2

无法使用for line in f进行第一次迭代，以免破坏文件中的准确位置。

Answer 2

最简单的解决方案是str.split

>>> s = filecontents.split("$$$NEW_SECTION$$$")
>>> s[0]
'0x23499 0x234234 0x234234\n\n0x34534 0x353454 0x345464\n'
>>> s[1]
'\n4345-34534-345-345345-3453\n3453-34534-346-766788-3534'

Answer 3

解决方案1：

如果文件不是很大，那么：

with open('your_log.txt') as f:
  parts = f.read().split('$$$NEW_SECTION$$$')
  if len(parts) > 0:
    part1 = parts[0]
    ...

解决方案2：

def FileParser(filepath):
  with open(filepath) as f:
    part = ''
    while(line = f.readline()):
      part += line
      if (line != '$$$NEW_SECTION$$$'):
        returnpart = part
        part = ''
        yield returnpart


for segment in FileParser('your_log.txt'):
    print segment

注意：它是未经测试的代码，因此请在使用前验证

Answer 4

解决方案：

def sec(file_, sentinel):
    with open(file_) as f:
        section = []
        for i in iter(f.readline, ''):
            if i.rstrip() == sentinel:
                yield section
                section = []
            else:
                section.append(i)
        yield section

并使用：

>>> from pprint import pprint
>>> pprint(list(sec('file.txt')))
[['0x23499 0x234234 0x234234\n', '0x34534 0x353454 0x345464\n'],
 ['4345-34534-345-345345-3453\n',
  '3453-34534-346-766788-3534\n',
  '3453-34534-346-746788-3534\n']]
>>>

要变量的部分或dict的最佳部分：

>>> sections = {}
>>> for n, section in enumerate(sec('file.txt')):
...     sections[n] = section
>>>

一些'标记'后如何从文件中读取？

4 个答案: