正则表达:结合两组

时间:2017-12-09 22:51:23

标签: python regex string

测试字符串:

First
Name:
MICKEY
One to
four lines
of cruft go here
Last
Name:
MOUSE
More cruft
goes here

我想返回一个小组" MICKEY MOUSE"

我有:

 (?:First\WName:)\W((.+)\W(?:((.+\W){1,4})(?:Last\WName:\W))(.+))

第2组返回MICKEY,第5组返回MOUSE。

我认为将它们封装在一个组中,并使用?:将中间残留和姓氏分段为非捕获组会阻止它们出现。但是第1组返回

MICKEY One to four lines of cruft go here Last Name: MOUSE

如何从中返回的内容中删除中间内容(或者将第2组和第5组组合成一个命名或编号的组)?

3 个答案:

答案 0 :(得分:1)

要解决此问题,您可以在正则表达式中使用非捕获组。这些声明为:(?:)

将正则表达式修改为:

(?:First\WName:)\W((.+)\W(?:(?:(?:.+\W){1,4})(?:Last\WName:\W))(.+))

你可以在python中执行以下操作:

import re

inp = """
First
Name:
MICKEY
One to
four lines
of cruft go here
Last
Name:
MOUSE
More cruft
goes here
"""
query = r'(?:First\WName:)\W((.+)\W(?:(?:(?:.+\W){1,4})(?:Last\WName:\W))(.+))'
output = ' '.join(re.match(query, inp).groups())

答案 1 :(得分:0)

您可以拆分字符串并检查所有字符是否都是大写:

import re
s = """
First
Name:
MICKEY
One to
four lines
of cruft go here
Last
Name:
MOUSE
More cruft
goes here
"""
final_data = ' '.join(i for i in s.split('\n') if re.findall('^[A-Z]+$', i))

输出:

'MICKEY MOUSE'

或者,纯正的正则表达式解决方案:

new_data = ' '.join(re.findall('(?<=)[A-Z]+(?=\n)', s))

输出:

'MICKEY MOUSE'

答案 2 :(得分:0)

使用re.search()函数和特定的正则表达式模式:

import re

s = '''
First
Name:
MICKEY
One to
four lines
of cruft go here
Last
Name:
MOUSE
More cruft
goes here'''

result = re.search(r'Name:\n(?P<firstname>\S+)[\s\S]*Name:\n(?P<lastname>\S+)', s).groupdict()
print(result)

输出:

{'firstname': 'MICKEY', 'lastname': 'MOUSE'}

<强> ----------

使用re.findall()函数更简单:

result = re.findall(r'(?<=Name:\n)(\S+)', s)
print(result)

输出:

['MICKEY', 'MOUSE']