测试字符串:
First
Name:
MICKEY
One to
four lines
of cruft go here
Last
Name:
MOUSE
More cruft
goes here
我想返回一个小组" MICKEY MOUSE"
我有:
(?:First\WName:)\W((.+)\W(?:((.+\W){1,4})(?:Last\WName:\W))(.+))
第2组返回MICKEY,第5组返回MOUSE。
我认为将它们封装在一个组中,并使用?:
将中间残留和姓氏分段为非捕获组会阻止它们出现。但是第1组返回
MICKEY
One to
four lines
of cruft go here
Last
Name:
MOUSE
如何从中返回的内容中删除中间内容(或者将第2组和第5组组合成一个命名或编号的组)?
答案 0 :(得分:1)
要解决此问题,您可以在正则表达式中使用非捕获组。这些声明为:(?:)
将正则表达式修改为:
(?:First\WName:)\W((.+)\W(?:(?:(?:.+\W){1,4})(?:Last\WName:\W))(.+))
你可以在python中执行以下操作:
import re
inp = """
First
Name:
MICKEY
One to
four lines
of cruft go here
Last
Name:
MOUSE
More cruft
goes here
"""
query = r'(?:First\WName:)\W((.+)\W(?:(?:(?:.+\W){1,4})(?:Last\WName:\W))(.+))'
output = ' '.join(re.match(query, inp).groups())
答案 1 :(得分:0)
您可以拆分字符串并检查所有字符是否都是大写:
import re
s = """
First
Name:
MICKEY
One to
four lines
of cruft go here
Last
Name:
MOUSE
More cruft
goes here
"""
final_data = ' '.join(i for i in s.split('\n') if re.findall('^[A-Z]+$', i))
输出:
'MICKEY MOUSE'
或者,纯正的正则表达式解决方案:
new_data = ' '.join(re.findall('(?<=)[A-Z]+(?=\n)', s))
输出:
'MICKEY MOUSE'
答案 2 :(得分:0)
使用re.search()
函数和特定的正则表达式模式:
import re
s = '''
First
Name:
MICKEY
One to
four lines
of cruft go here
Last
Name:
MOUSE
More cruft
goes here'''
result = re.search(r'Name:\n(?P<firstname>\S+)[\s\S]*Name:\n(?P<lastname>\S+)', s).groupdict()
print(result)
输出:
{'firstname': 'MICKEY', 'lastname': 'MOUSE'}
<强> ---------- 强>
使用re.findall()
函数更简单:
result = re.findall(r'(?<=Name:\n)(\S+)', s)
print(result)
输出:
['MICKEY', 'MOUSE']