Question

如何使用正则表达式限制搜索文本的哪些部分？鉴于以下示例，请说我想获取customer02的详细信息。如果我使用

名称：\ S *（。+）

那么显然我会得到3个结果。因此，我想将其限制为仅在customer02的详细信息下进行搜索，并在到达customer03时停止。我当然可以使用结果索引（即结果= ['Mr Smith'，'Mr Jones'，'Mr Brown']，因此结果[1]）但这看起来很笨拙。

[Customer01]

姓名：史密斯先生

地址：某处

电话：01234567489

[Customer02]

姓名：琼斯先生

地址：Laandon

电话：

[Customer03]

姓名：布朗先生

地址：Bibble

电话：077764312

Answer 1

这不是正则表达式要解决的问题。最好的办法是先将数据解析成结构（可能使用正则表达式来帮助“分块”数据）。

Answer 2

数据的格式是什么？它是一个字符串？如果效率不是主要问题，那么显而易见的事情就是切割字符串：

start = cdata.find("[Customer01]")
end = cdata.find("[Customer02]")
result = re.search('Name:\s*(.+)', cdata[start:end]).group(0)

或更简洁：

name = re.search('Name:\s*(.+)', cdata[cdata.find("[Customer01]"): cdata.find("[Customer02]")]).group(0)

编辑：或错误检查：

start = cdata.find("[Customer01]")
end = cdata.find("[Customer02]")
result = re.search('Name:\s*(.+)', cdata[start:end])
if result: name = result.group(0)

Answer 3

如果您知道要搜索的特定边界，并且您希望获得捕获组，那么为什么不这样做： import re text = "[Customer01]\nName: Mr Smith\nAddress: Somewhere\nTelephone: 01234567489\n[Customer02]\nName: Mr Jones\nAddress: Laandon\nTelephone:\n[Customer03]\nName: Mr Brown\nAddress: Bibble\nTelephone: 077764312" blah = re.search("[Customer02]\nName:\s*(.*?)\n", text) print blah.group(1)

这将返回“琼斯先生”。我想这就是你想要的。

Answer 4

re模块无法限制匹配范围。如果您已经知道要将其限制为的索引，则可以匹配子字符串。

Answer 5

以下适合您吗？

ch = """
[Customer01]
Name: Mr Smith
Address: Somewhere 
Telephone: 01234567489

[Customer02] 
Name: Mr Jones 
Address: Laandon 
Telephone: 

[Customer03] 
Name: Mr Brown 
Address: Bibble 
Telephone: 077764312

[Customer04]
Name: Mr Acarid
Address: Carpet 
Telephone: 88864592

[Customer05] 
Name: Mr Johannes 
Address: Zuidersee 
Telephone: 

[Customer06] 
Name: Mr Bringt 
Address: Babylon 
Telephone: 077747812

[Customer07] 
Name: Ms Amanda 
Address: Madrid 
Telephone: 187354988

[Customer88] 
Name: Ms Heighty 
Address: Cairo 
Telephone: 11128

"""

import re

blah = '''Enter the characteristics you want the items to be selected upon :
- the Customer's numbers (separated by commas) : '''
must = {'Customer' : re.findall('0*(\d+)',raw_input(blah)) ,'Name':[],'Address':[],'Telephone':[] }

while True:
    y = raw_input('- strings desired in the Names (void to finish) : ')
    if y:  must['Name'].append(y)
    else:  break

while True:
    y = raw_input('- strings desired in the Addresses (void to finish) : ')
    if y:  must['Address'].append(y)
    else:  break

while True:
    y = raw_input('- strings desired in the Telephone numbers (void to finish) : ')
    if y:  must['Telephone'].append(y)
    else:  break

pat = re.compile('\[Customer0*(?P<Customer>\d+)].*\nName:(?P<Name>.*)\nAddress:(?P<Address>.*)\nTelephone:(?P<Telephone>.*)')

print ch,'\n\nmust==',must,'\n\n'

print '\n'.join( repr(match.groups()) for match in pat.finditer(ch)
                 if any((x==match.group(k) if k=='Customer' else x in match.group(k))
                        for k in must.iterkeys() for x in must[k]) )

例如输入数据

must== {'Customer': ['003', '8', '6'], 'Telephone': ['645'], 'Name': [], 'Address': ['Laa']}

结果是

('2', ' Mr Jones ', ' Laandon ', ' ')
('3', ' Mr Brown ', ' Bibble ', ' 077764312')
('4', ' Mr Acarid', ' Carpet ', ' 88864592')
('6', ' Mr Bringt ', ' Babylon ', ' 077747812')

请注意，在此结果中，尽管已将“8”作为所需数字，但不存在与Customer88对应的部分。这是通过测试获得的

x==match.group(k) if k=='Customer'

否则测试

x in match.group(k)

因此“A if condition_upon_k else B”表达式

将正则表达式限制为仅在两点之间搜索

5 个答案: