在下面的示例中,有人会帮助我吗(如果我使用re.DOTALL
,它会一直读到文件末尾):
import re
text = "Found to A from:\n\t-B\n\t-C\nFound to K from:\n\t-B\n\t-D\n\t-E\nFound to A from:\n\t-D\nMax time: 20s"
names = ['A', 'K']
for name in names:
print name
print re.findall("Found to {0} from:\n\t\-(.+)".format(name), text)
TEXT就像:
输出:
A
['B', 'D']
K
['B']
所需的输出:
A
['B', 'C', 'D']
K
['B', 'D', 'E']
答案 0 :(得分:4)
这是另一种方法(Python 2.7x):
import re
text = 'Found to A from:\n\t-B\n\t-C\nFound to K from:\n\t-B\n\t-D\n\t-E\nFound to A from:\n\t-D\nMax time: 20s'
for name in ['A', 'K']:
print name
print [ n for i in re.findall('(?:Found to ' + name + ' from:)(?:\\n\\t-([A-Z]))(?:\\n\\t-([A-Z]))?(?:\\n\\t-([A-Z]))?', text) for n in i if n ]
输出:
A
['B', 'C', 'D']
K
['B', 'D', 'E']
UPDATE 如果您不知道有多少(?:\ n \ t - ([A-Z])),我建议采用以下方法:
import re
text = 'Found to A from:\n\t-B\n\t-C\n\t-G\nFound to K from:\n\t-B\n\t-D\n\t-E\nFound to A from:\n\t-D\nMax time: 20s'
for name in ['A', 'K']:
print name
groups = re.findall('(?:Found to ' + name + ' from:)((?:\\n\\s*-(?:[A-Z]))+)', text)
print reduce(lambda i,j: i + j, map(lambda x: re.findall('\n\s*-([A-Z])', x), groups))
输出:
A
['B', 'C', 'G', 'D']
K
['B', 'D', 'E']
答案 1 :(得分:2)
当我输入此答案时,我试图回答您的原始问题,其中您有一个具有要解析的特定内容的文件。我想我的答案仍然适用。如果您有一个字符串,请更改
for line in f:
到
for line in f.splitlines():
并将字符串而不是文件对象传递给keys_and_values
。
原始答案:
老实说,我认为这看起来像是一项任务,其中繁重的工作应由发电机完成,并在正则表达式的帮助下完成。
import re
from collections import OrderedDict
def keys_and_values(f):
# discard any headers
target = '^\s*Found to [A-Z] from:\s*$'
for line in f:
if re.match(target, line.strip()):
break
# yield (key, value) tuples
key = line.strip()[9]
for line in f:
line = line.strip()
if re.match(target, line):
key = line[9]
elif line:
yield (key, line)
result = OrderedDict()
with open('testfile.txt') as f:
for k,v in keys_and_values(f):
result.setdefault(k, []).append(v)
for k in result:
print('{}\n{}\n'.format(k, result[k]))
演示:
$ cat testfile.txt
some
useless
header
lines
Found to A from:
B
C
Found to K from:
B
D
E
Found to A from:
D
$ python parsefile.py
A
['B', 'C', 'D']
K
['B', 'D', 'E']
答案 2 :(得分:0)
不是通用的,但适用于您的情况并且很简单,并且正在使用您提到的findAll。
import re
text = "Found to A from:\n\t-B\n\t-C\nFound to K from:\n\t-B\n\t-D\n\t-E\nFound to A from:\n\t-D\n"
names = ['A', 'K']
for name in names:
print name
test = re.findall("Found to {0} from:\n\t-([A-Z])(\n\t)?-?([A-Z])?(\n\t)?-?([A-Z])?".format(name), text)
# normalize it
prettyList = []
for (a,b,c,d,e) in test:
prettyList.append(a)
prettyList.append(c)
prettyList.append(e)
print [x for x in prettyList if x]
输出
A
['B', 'C', 'D']
K
['B', 'D', 'E']
我知道有很多案例有3个元素,所以你必须添加额外的匹配。