Question

测试字符串：

First
Name:
MICKEY
One to
four lines
of cruft go here
Last
Name:
MOUSE
More cruft
goes here

我想返回一个小组＆＃34; MICKEY MOUSE＆＃34;

我有：

 (?:First\WName:)\W((.+)\W(?:((.+\W){1,4})(?:Last\WName:\W))(.+))

第2组返回MICKEY，第5组返回MOUSE。

我认为将它们封装在一个组中，并使用?:将中间残留和姓氏分段为非捕获组会阻止它们出现。但是第1组返回

MICKEY One to four lines of cruft go here Last Name: MOUSE

如何从中返回的内容中删除中间内容（或者将第2组和第5组组合成一个命名或编号的组）？

Answer 1

要解决此问题，您可以在正则表达式中使用非捕获组。这些声明为：(?:)

将正则表达式修改为：

(?:First\WName:)\W((.+)\W(?:(?:(?:.+\W){1,4})(?:Last\WName:\W))(.+))

你可以在python中执行以下操作：

import re

inp = """
First
Name:
MICKEY
One to
four lines
of cruft go here
Last
Name:
MOUSE
More cruft
goes here
"""
query = r'(?:First\WName:)\W((.+)\W(?:(?:(?:.+\W){1,4})(?:Last\WName:\W))(.+))'
output = ' '.join(re.match(query, inp).groups())

Answer 2

您可以拆分字符串并检查所有字符是否都是大写：

import re
s = """
First
Name:
MICKEY
One to
four lines
of cruft go here
Last
Name:
MOUSE
More cruft
goes here
"""
final_data = ' '.join(i for i in s.split('\n') if re.findall('^[A-Z]+$', i))

输出：

'MICKEY MOUSE'

或者，纯正的正则表达式解决方案：

new_data = ' '.join(re.findall('(?<=)[A-Z]+(?=\n)', s))

输出：

'MICKEY MOUSE'

Answer 3

使用re.search()函数和特定的正则表达式模式：

import re

s = '''
First
Name:
MICKEY
One to
four lines
of cruft go here
Last
Name:
MOUSE
More cruft
goes here'''

result = re.search(r'Name:\n(?P<firstname>\S+)[\s\S]*Name:\n(?P<lastname>\S+)', s).groupdict()
print(result)

输出：

{'firstname': 'MICKEY', 'lastname': 'MOUSE'}

<强> ----------

使用re.findall()函数更简单：

result = re.findall(r'(?<=Name:\n)(\S+)', s)
print(result)

输出：

['MICKEY', 'MOUSE']

正则表达：结合两组

3 个答案: