我有这样的模式
name: steven
add: hyderabad
add: India
name: samuel
add: chennai
add: tamilnadu
add: India
...
地址行(add :)可能有所不同。如何编写正则表达式以捕获名称和地址内容?
请注意,每条记录用\n\n
分隔,记录的每行用新行分隔。
预先感谢
答案 0 :(得分:0)
我认为最简单的方法是完全避免使用正则表达式,而仅使用生成器。我们可以在每个名字后面累积行,然后成对产生
def gen(file):
name = None
addresses = []
for line in file:
if line.startswith('name'):
name = line.split(':')[1].strip()
elif line.startswith('add'):
addresses.append(line.split(':')[1].strip())
else:
if name is not None:
yield (name, addresses)
name, addresses = None, []
if name is not None:
yield (name, addresses)
with open(filename) as f:
print(list(gen(f)))
# [('steven', ['hyderabad', 'India']), ('samuel', ['chennai', 'tamilnadu', 'India'])]
答案 1 :(得分:0)
您可以使用re
来解析输入数据:
data = """
name: steven
add: hyderabad
add: India
name: samuel
add: chennai
add: tamilnadu
add: India
"""
import re
for group in re.findall(r'(name:.*?)(?:(?=name:)|\Z)', data, flags=re.DOTALL):
name = re.findall(r'(?:name:\s*([^\n]+))', group, flags=re.DOTALL)
addresses = re.findall(r'(?:add:\s*([^\n]+))+', group, flags=re.DOTALL)
print(name[0], addresses)
print('-' * 80)
打印:
steven ['hyderabad', 'India']
--------------------------------------------------------------------------------
samuel ['chennai', 'tamilnadu', 'India']
--------------------------------------------------------------------------------
答案 2 :(得分:0)