如何使用正则表达式限制搜索文本的哪些部分?鉴于以下示例,请说我想获取customer02的详细信息。如果我使用
名称:\ S *(。+)
那么显然我会得到3个结果。因此,我想将其限制为仅在customer02的详细信息下进行搜索,并在到达customer03时停止。我当然可以使用结果索引(即结果= ['Mr Smith','Mr Jones','Mr Brown'],因此结果[1])但这看起来很笨拙。[Customer01]
姓名:史密斯先生
地址:某处
电话:01234567489
[Customer02]
姓名:琼斯先生
地址:Laandon
电话:
[Customer03]
姓名:布朗先生
地址:Bibble
电话:077764312
答案 0 :(得分:3)
这不是正则表达式要解决的问题。最好的办法是先将数据解析成结构(可能使用正则表达式来帮助“分块”数据)。
答案 1 :(得分:1)
数据的格式是什么?它是一个字符串?如果效率不是主要问题,那么显而易见的事情就是切割字符串:
start = cdata.find("[Customer01]")
end = cdata.find("[Customer02]")
result = re.search('Name:\s*(.+)', cdata[start:end]).group(0)
或更简洁:
name = re.search('Name:\s*(.+)', cdata[cdata.find("[Customer01]"): cdata.find("[Customer02]")]).group(0)
编辑:或错误检查:
start = cdata.find("[Customer01]")
end = cdata.find("[Customer02]")
result = re.search('Name:\s*(.+)', cdata[start:end])
if result: name = result.group(0)
答案 2 :(得分:1)
如果您知道要搜索的特定边界,并且您希望获得捕获组,那么为什么不这样做:
import re
text = "[Customer01]\nName: Mr Smith\nAddress: Somewhere\nTelephone: 01234567489\n[Customer02]\nName: Mr Jones\nAddress: Laandon\nTelephone:\n[Customer03]\nName: Mr Brown\nAddress: Bibble\nTelephone: 077764312"
blah = re.search("[Customer02]\nName:\s*(.*?)\n", text)
print blah.group(1)
这将返回“琼斯先生”。我想这就是你想要的。
答案 3 :(得分:0)
re
模块无法限制匹配范围。如果您已经知道要将其限制为的索引,则可以匹配子字符串。
答案 4 :(得分:0)
以下适合您吗?
ch = """
[Customer01]
Name: Mr Smith
Address: Somewhere
Telephone: 01234567489
[Customer02]
Name: Mr Jones
Address: Laandon
Telephone:
[Customer03]
Name: Mr Brown
Address: Bibble
Telephone: 077764312
[Customer04]
Name: Mr Acarid
Address: Carpet
Telephone: 88864592
[Customer05]
Name: Mr Johannes
Address: Zuidersee
Telephone:
[Customer06]
Name: Mr Bringt
Address: Babylon
Telephone: 077747812
[Customer07]
Name: Ms Amanda
Address: Madrid
Telephone: 187354988
[Customer88]
Name: Ms Heighty
Address: Cairo
Telephone: 11128
"""
import re
blah = '''Enter the characteristics you want the items to be selected upon :
- the Customer's numbers (separated by commas) : '''
must = {'Customer' : re.findall('0*(\d+)',raw_input(blah)) ,'Name':[],'Address':[],'Telephone':[] }
while True:
y = raw_input('- strings desired in the Names (void to finish) : ')
if y: must['Name'].append(y)
else: break
while True:
y = raw_input('- strings desired in the Addresses (void to finish) : ')
if y: must['Address'].append(y)
else: break
while True:
y = raw_input('- strings desired in the Telephone numbers (void to finish) : ')
if y: must['Telephone'].append(y)
else: break
pat = re.compile('\[Customer0*(?P<Customer>\d+)].*\nName:(?P<Name>.*)\nAddress:(?P<Address>.*)\nTelephone:(?P<Telephone>.*)')
print ch,'\n\nmust==',must,'\n\n'
print '\n'.join( repr(match.groups()) for match in pat.finditer(ch)
if any((x==match.group(k) if k=='Customer' else x in match.group(k))
for k in must.iterkeys() for x in must[k]) )
例如输入数据
must== {'Customer': ['003', '8', '6'], 'Telephone': ['645'], 'Name': [], 'Address': ['Laa']}
结果是
('2', ' Mr Jones ', ' Laandon ', ' ')
('3', ' Mr Brown ', ' Bibble ', ' 077764312')
('4', ' Mr Acarid', ' Carpet ', ' 88864592')
('6', ' Mr Bringt ', ' Babylon ', ' 077747812')
请注意,在此结果中,尽管已将“8”作为所需数字,但不存在与Customer88对应的部分。这是通过测试获得的
x==match.group(k) if k=='Customer'
否则测试
x in match.group(k)
因此“A if condition_upon_k else B”表达式