将正则表达式限制为仅在两点之间搜索

时间:2011-01-28 17:07:51

标签: python regex

如何使用正则表达式限制搜索文本的哪些部分?鉴于以下示例,请说我想获取customer02的详细信息。如果我使用

名称:\ S *(。+)

那么显然我会得到3个结果。因此,我想将其限制为仅在customer02的详细信息下进行搜索,并在到达customer03时停止。我当然可以使用结果索引(即结果= ['Mr Smith','Mr Jones','Mr Brown'],因此结果[1])但这看起来很笨拙。

  

[Customer01]

     

姓名:史密斯先生

     

地址:某处

     

电话:01234567489

     

[Customer02]

     

姓名:琼斯先生

     

地址:Laandon

     

电话:

     

[Customer03]

     

姓名:布朗先生

     

地址:Bibble

     

电话:077764312

5 个答案:

答案 0 :(得分:3)

这不是正则表达式要解决的问题。最好的办法是先将数据解析成结构(可能使用正则表达式来帮助“分块”数据)。

答案 1 :(得分:1)

数据的格式是什么?它是一个字符串?如果效率不是主要问题,那么显而易见的事情就是切割字符串:

start = cdata.find("[Customer01]")
end = cdata.find("[Customer02]")
result = re.search('Name:\s*(.+)', cdata[start:end]).group(0)

或更简洁:

name = re.search('Name:\s*(.+)', cdata[cdata.find("[Customer01]"): cdata.find("[Customer02]")]).group(0)

编辑:或错误检查:

start = cdata.find("[Customer01]")
end = cdata.find("[Customer02]")
result = re.search('Name:\s*(.+)', cdata[start:end])
if result: name = result.group(0)

答案 2 :(得分:1)

如果您知道要搜索的特定边界,并且您希望获得捕获组,那么为什么不这样做:

import re
text = "[Customer01]\nName: Mr Smith\nAddress: Somewhere\nTelephone: 01234567489\n[Customer02]\nName: Mr Jones\nAddress: Laandon\nTelephone:\n[Customer03]\nName: Mr Brown\nAddress: Bibble\nTelephone: 077764312"
blah = re.search("[Customer02]\nName:\s*(.*?)\n", text)
print blah.group(1)

这将返回“琼斯先生”。我想这就是你想要的。

答案 3 :(得分:0)

re模块无法限制匹配范围。如果您已经知道要将其限制为的索引,则可以匹配子字符串。

答案 4 :(得分:0)

以下适合您吗?

ch = """
[Customer01]
Name: Mr Smith
Address: Somewhere 
Telephone: 01234567489

[Customer02] 
Name: Mr Jones 
Address: Laandon 
Telephone: 

[Customer03] 
Name: Mr Brown 
Address: Bibble 
Telephone: 077764312

[Customer04]
Name: Mr Acarid
Address: Carpet 
Telephone: 88864592

[Customer05] 
Name: Mr Johannes 
Address: Zuidersee 
Telephone: 

[Customer06] 
Name: Mr Bringt 
Address: Babylon 
Telephone: 077747812

[Customer07] 
Name: Ms Amanda 
Address: Madrid 
Telephone: 187354988

[Customer88] 
Name: Ms Heighty 
Address: Cairo 
Telephone: 11128

"""

import re

blah = '''Enter the characteristics you want the items to be selected upon :
- the Customer's numbers (separated by commas) : '''
must = {'Customer' : re.findall('0*(\d+)',raw_input(blah)) ,'Name':[],'Address':[],'Telephone':[] }

while True:
    y = raw_input('- strings desired in the Names (void to finish) : ')
    if y:  must['Name'].append(y)
    else:  break

while True:
    y = raw_input('- strings desired in the Addresses (void to finish) : ')
    if y:  must['Address'].append(y)
    else:  break

while True:
    y = raw_input('- strings desired in the Telephone numbers (void to finish) : ')
    if y:  must['Telephone'].append(y)
    else:  break

pat = re.compile('\[Customer0*(?P<Customer>\d+)].*\nName:(?P<Name>.*)\nAddress:(?P<Address>.*)\nTelephone:(?P<Telephone>.*)')

print ch,'\n\nmust==',must,'\n\n'

print '\n'.join( repr(match.groups()) for match in pat.finditer(ch)
                 if any((x==match.group(k) if k=='Customer' else x in match.group(k))
                        for k in must.iterkeys() for x in must[k]) )

例如输入数据

must== {'Customer': ['003', '8', '6'], 'Telephone': ['645'], 'Name': [], 'Address': ['Laa']} 

结果是

('2', ' Mr Jones ', ' Laandon ', ' ')
('3', ' Mr Brown ', ' Bibble ', ' 077764312')
('4', ' Mr Acarid', ' Carpet ', ' 88864592')
('6', ' Mr Bringt ', ' Babylon ', ' 077747812')

请注意,在此结果中,尽管已将“8”作为所需数字,但不存在与Customer88对应的部分。这是通过测试获得的

x==match.group(k) if k=='Customer'

否则测试

x in match.group(k)

因此“A if condition_upon_k else B”表达式