如何阅读两个标题之间的文本,并返回该标题下文本中某些模式匹配的标题名称

时间:2017-06-14 21:25:36

标签: python

我有一个这样的txt文件:

heading 1
 abad askjd asdfj dk skldfj
 askdja ajsdk a ajksdj ajsdk a ajksd
 value yes never dont care about this
 asdj aksjd sd asda sda
 alsdk skldfj sd asda sda
heading 2
 asd asd dfgfd dk a ajksdj
 asdas asd asd sd asda sda
 asdas asdasd dk a ajksdj
 value 123 456 dk a ajksdj
 asdasd asdasd
heading 3
 asd asda sda jsdk a ajksdj
 dsfgd dfgd g dk a aj
 dfgfdg dfgfd dk skldfj
 value yes never dont care about this
 asdasd asdasd gd dfgd g dk 

所以在上面的文件中,我希望获得所有标题为“value yes never”的标题,因此当前文件的输出应为:

标题1 标题3

我如何在python中进行此操作?

4 个答案:

答案 0 :(得分:0)

我建议构建一个逐行迭代文本的简单循环。我不确定格式是否与您发布的内容相匹配,但您可以执行以下操作

fname = "file.txt"
with open(fname) as f:
   for line in f:
      if not line.startswith( ' ' ):
         print line

答案 1 :(得分:0)

这对我有用。

    f = open("file.txt",'r')
    headings= []
    headingsCount = 0
    positionOFValueYesNever = []

    for line in f:

       if not line.startswith(' '):
             headings.append(line)
             headingsCount+=1

       elif line.find('value yes never') == True:
             positionOFValueYesNever.append(headingsCount-1)    

    for  i in positionOFValueYesNever:
       print(headings[i])

答案 2 :(得分:0)

以下代码将打印出您想要的内容:

fname = "file.txt"
with open(fname) as f:
    headings = []
    for line in f:
        if not line.startswith(' '):
            current_heading = line.strip()
        else:
            if 'value yes never' in line:
                headings.append(current_heading)

print(headings)

如果您希望文本"值为yes,那么"要在每个标题之间的多行上发生,那么您可以在最后进行重复数据删除:

print(set(headings))

或在追加之前添加支票:

if current_heading not in headings:
    headings.append(current_heading)

答案 3 :(得分:0)

您可以使用groupby模块中的itertools来对条件中的行进行分组(如果它以空格开头)。然后再将结果分组为两个。

这是一个例子:

from itertools import groupby
# Assuming your input file is called: input.txt
with open('input.txt', 'r') as f:
    data = f.read().splitlines()

sub = []
for _, v in groupby(data, lambda x: x.startswith(' ')):
    sub.append([j.strip() for j in list(v)])

final = [sub[k:k+2] for k in range(0, len(sub), 2)]
print(final)

输出:

[
 [['heading 1'], ['abad askjd asdfj dk skldfj', 'askdja ajsdk a ajksdj ajsdk a ajksd', '**value yes never** dont care about this', 'asdj aksjd sd asda sda', 'alsdk skldfj sd asda sda']], 
  [['heading 2'], ['asd asd dfgfd dk a ajksdj','asdasasd asd sd asda sda', 'asdas asdasd dk a ajksdj', 'value 123 456 dk a ajksdj', 'asdasd asdasd']], 
  [['heading 3'], ['asd asda sda jsdk a ajksdj','dsfgd dfgd g dk a aj', 'dfgfdg dfgfd dk skldfj', '**value yes never**dont care about this', 'asdasd asdasd gd dfgd g dk']]
]