在python中读取头文件之间的文件

时间:2018-02-26 15:48:57

标签: python python-3.x

我有一个大文本文件,其值由以"#"开头的标题分隔。如果条件与标题中的条件匹配,我想读取文件直到下一个标题"#"和SKIP其余的文件。

要测试我是否正在尝试阅读名为test234.txt的以下文本文件:

# abcdefgh
1fnrnf
mrkfr
nfoiernfr
nerfnr
# something
njndjen kj
ejkndjke
#vcrvr

我写的代码是:

file_t = open('test234.txt')
cond = True
while cond:
    for line_ in file_t:
        print(line_)
        if file_t.read(1) == "#":
            cond = False
file_t.close()

但是,我得到的输出是:

# abcdefgh

fnrnf

rkfr

foiernfr

erfnr

something

jndjen kj

jkndjke

vcrvr

相反,我希望两个标题之间的输出用"#"这是:

1fnrnf
mrkfr
nfoiernfr
nerfnr      

我该怎么做?谢谢!

编辑:Reading in file block by block using specified delimiter in python谈论以标题分隔的组中读取文件,但我不想阅读所有标题。我只想阅读满足给定条件的标题,并且只要该行到达标记为'#'的下一个标题。它停止阅读文件。

2 个答案:

答案 0 :(得分:3)

itertools.groupby可以提供帮助:

from io import StringIO
from itertools import groupby

text = '''# abcdefgh
1fnrnf
mrkfr
nfoiernfr
nerfnr
# something
njndjen kj
ejkndjke
#vcrvr'''


with StringIO(text) as file:
    lines = (line.strip() for line in file)  # removing trailing '\n'
    for key, group in groupby(lines, key=lambda x: x[0]=='#'):

        if key is True:
            # found a line that starts with '#'
            print('found header: {}'.format(next(group)))

        if key is False:
            # group now contanins all lines that do not start with '#'
            print('\n'.join(group))

请注意,所有这些都是 lazy 。你只会在内存中的两个标题之间拥有所有项目。

您必须将with StringIO(text) as file:替换为; with open('test234.txt', 'r') as file: ...

测试的输出是:

found header: # abcdefgh
1fnrnf
mrkfr
nfoiernfr
nerfnr
found header: # something
njndjen kj
ejkndjke
found header: #vcrvr

更新,因为我误解了。这是一个新的尝试:

from io import StringIO
from collections import deque
from itertools import takewhile

from_line = '# abcdefgh'
to_line = '# something'

with StringIO(text) as file:
    lines = (line.strip() for line in file)  # removing trailing '\n'

    # fast-forward up to from_line
    deque(takewhile(lambda x: x != from_line, lines), maxlen=0)

    for line in takewhile(lambda x: x != to_line, lines):
        print(line)

我使用itertools.takewhile来获取迭代器直到满足一个转义(直到你的情况下找到第一个头)。

deque部分只是itertools食谱中建议的consume pattern。它只是快速前进到给定条件不再存在的点。

答案 1 :(得分:1)

学习和使用正则表达式。它将帮助您完成所有文档表示过程。

import re #regex library

with open('test234.txt') as f:  #file stream
    lines = f.readlines()       #reads all lines

p = re.compile('^#.*')          #regex pattern creation

for l in lines:
    if p.match(l) == None:      #looks for non-matching lines
        print(l[:-2])