匹配后,Python正则表达式可能匹配

时间:2018-07-31 14:12:07

标签: python regex

我有这样的模式

name: steven
add: hyderabad
add: India

name: samuel
add: chennai
add: tamilnadu
add: India


...

地址行(add :)可能有所不同。如何编写正则表达式以捕获名称和地址内容?

请注意,每条记录用\n\n分隔,记录的每行用新行分隔。

预先感谢

3 个答案:

答案 0 :(得分:0)

我认为最简单的方法是完全避免使用正则表达式,而仅使用生成器。我们可以在每个名字后面累积行,然后成对产生

def gen(file):
    name = None
    addresses = []
    for line in file:
        if line.startswith('name'):
            name = line.split(':')[1].strip()
        elif line.startswith('add'):
            addresses.append(line.split(':')[1].strip())
        else:
            if name is not None:
                yield (name, addresses)
                name, addresses = None, []
    if name is not None:
         yield (name, addresses)

with open(filename) as f:
    print(list(gen(f)))

 # [('steven', ['hyderabad', 'India']), ('samuel', ['chennai', 'tamilnadu', 'India'])]

答案 1 :(得分:0)

您可以使用re来解析输入数据:

data = """
name: steven
add: hyderabad
add: India

name: samuel
add: chennai
add: tamilnadu
add: India
"""

import re

for group in re.findall(r'(name:.*?)(?:(?=name:)|\Z)', data, flags=re.DOTALL):
    name = re.findall(r'(?:name:\s*([^\n]+))', group, flags=re.DOTALL)
    addresses = re.findall(r'(?:add:\s*([^\n]+))+', group, flags=re.DOTALL)
    print(name[0], addresses)
    print('-' * 80)

打印:

steven ['hyderabad', 'India']
--------------------------------------------------------------------------------
samuel ['chennai', 'tamilnadu', 'India']
--------------------------------------------------------------------------------

答案 2 :(得分:0)

尝试以下模式:name: [a-zA-Z]+\n(add: [a-zA-Z0-9]+\n)+

它捕获具有这种结构的组:

name: ...
add: ...
...
add: ...

DEMO