Question

在下面的示例中，有人会帮助我吗（如果我使用re.DOTALL，它会一直读到文件末尾）：

import re

text = "Found to A from:\n\t-B\n\t-C\nFound to K from:\n\t-B\n\t-D\n\t-E\nFound to A from:\n\t-D\nMax time: 20s"

names = ['A', 'K']
for name in names:
    print name
    print re.findall("Found to {0} from:\n\t\-(.+)".format(name), text)

TEXT就像：

输出：

A

['B', 'D']

K

['B']

所需的输出：

A

['B', 'C', 'D']

K

['B', 'D', 'E']

Answer 1

这是另一种方法（Python 2.7x）：

import re
text = 'Found to A from:\n\t-B\n\t-C\nFound to K from:\n\t-B\n\t-D\n\t-E\nFound to A from:\n\t-D\nMax time: 20s'
for name in ['A', 'K']:
    print name
    print [ n for i in re.findall('(?:Found to ' + name + ' from:)(?:\\n\\t-([A-Z]))(?:\\n\\t-([A-Z]))?(?:\\n\\t-([A-Z]))?', text) for n in i if n ]

输出：

A
['B', 'C', 'D']
K
['B', 'D', 'E']

UPDATE 如果您不知道有多少（？：\ n \ t - （[A-Z]）），我建议采用以下方法：

import re
text = 'Found to A from:\n\t-B\n\t-C\n\t-G\nFound to K from:\n\t-B\n\t-D\n\t-E\nFound to A from:\n\t-D\nMax time: 20s'
for name in ['A', 'K']:
    print name
    groups = re.findall('(?:Found to ' + name + ' from:)((?:\\n\\s*-(?:[A-Z]))+)', text)
    print reduce(lambda i,j: i + j, map(lambda x: re.findall('\n\s*-([A-Z])', x), groups))

输出：

A
['B', 'C', 'G', 'D']
K
['B', 'D', 'E']

Answer 2

当我输入此答案时，我试图回答您的原始问题，其中您有一个具有要解析的特定内容的文件。我想我的答案仍然适用。如果您有一个字符串，请更改

for line in f:

到

for line in f.splitlines():

并将字符串而不是文件对象传递给keys_and_values。

原始答案：

老实说，我认为这看起来像是一项任务，其中繁重的工作应由发电机完成，并在正则表达式的帮助下完成。

import re
from collections import OrderedDict

def keys_and_values(f):
    # discard any headers
    target = '^\s*Found to [A-Z] from:\s*$'
    for line in f:
        if re.match(target, line.strip()):
            break

    # yield (key, value) tuples
    key = line.strip()[9]
    for line in f:
        line = line.strip()
        if re.match(target, line):
            key = line[9]
        elif line:
            yield (key, line)

result = OrderedDict()
with open('testfile.txt') as f:
    for k,v in keys_and_values(f):
        result.setdefault(k, []).append(v)

for k in result:
    print('{}\n{}\n'.format(k, result[k]))

演示：

$ cat testfile.txt 
some
useless
header
lines

Found to A from:

B

C

Found to K from:

B

D

E

Found to A from:

D
$ python parsefile.py
A
['B', 'C', 'D']

K
['B', 'D', 'E']

Answer 3

不是通用的，但适用于您的情况并且很简单，并且正在使用您提到的findAll。

import re

text = "Found to A from:\n\t-B\n\t-C\nFound to K from:\n\t-B\n\t-D\n\t-E\nFound to A from:\n\t-D\n"

names = ['A', 'K']
for name in names:
    print name
    test = re.findall("Found to {0} from:\n\t-([A-Z])(\n\t)?-?([A-Z])?(\n\t)?-?([A-Z])?".format(name), text)
    # normalize it
    prettyList = []
    for (a,b,c,d,e) in test:
        prettyList.append(a)
        prettyList.append(c)
        prettyList.append(e)
    print [x for x in prettyList if x]

输出

A
['B', 'C', 'D']
K
['B', 'D', 'E']

我知道有很多案例有3个元素，所以你必须添加额外的匹配。

使用＆＃34; Re＆＃34;在文本中查找重复的模式蟒蛇

3 个答案: