Question

我的正则表达式如下：

rgx = '(?P<foo>ABC)(?P<bar>DEF)?(?P<norf>HIJK)'

获取匹配的字符串没有问题m.group(name)。但是，我需要提取匹配组的名称和 span （甚至只是名称的跨度），并且还没有找到方法来执行此操作。我想做点什么：

p = re.compile(p, re.IGNORECASE)
m = p.match(targetstring)
#then do something to set 'all' to the list of match objects
for mo in all
   print mo.name() + '->' + mo.span()

例如输入字符串＆＃39; ABCDEFHIJK＆＃39;应该生成输出：

'foo'  -> (0, 3)
'bar'  -> (3, 6)
'norf' -> (6, 10)

谢谢！

Answer 1

迭代匹配组的名称（groupdict的键）并打印相应的span属性：

rgx = '(?P<foo>ABC)(?P<bar>DEF)?(?P<norf>HIJK)'
p = re.compile(rgx, re.IGNORECASE)
m = re.match(p, 'ABCDEFHIJKLM')

for key in m.groupdict():
    print key, m.span(key)

打印：

foo (0, 3)
bar (3, 6)
norf (6, 10)

编辑：由于字典的键是无序的，您可能希望明确选择迭代键的顺序。在下面的示例中，sorted(...)是按相应字典值（span元组）排序的组名列表：

for key in sorted(m.groupdict().keys(), key=m.groupdict().get):
    print key, m.span(key)

Answer 2

您可以使用RegexObject.groupindex：

p = re.compile(rgx, re.IGNORECASE)
m = p.match('ABCDEFHIJK')

for name, n in sorted(m.re.groupindex.items(), key=lambda x: x[1]):
    print name, m.group(n), m.span(n)

提取正则表达式匹配组的名称和跨度

2 个答案: