具有多个匹配的正则表达式 - Python

时间:2016-07-13 20:48:14

标签: python regex

我在网上搜索了类似的问题,但无法解决。

这是一个地址:

  

时尚潜力116 w 23rd st ste 5 5th floor纽约   10011

在python中使用以下正则表达式我试图在上面的行中找到所有可能的主要地址:

re.findall(r'^(.*)(\b\d+\b)(.+)(\bst\b|\bste\b)(.*)$', 'the fashion potential hq 116 w 23rd st ste 5 5th floor new york ny 10011')

我得到结果:

[('the fashion potential hq ', '116', ' w 23rd st ', 'ste', ' 5 5th floor new york ny 10011')]

我还希望结果包括:('the fash....', '116', 'w 23rd ', 'st', 'ste 5 5th....')。我希望findall可以做到这一点,但没有。非常感谢任何帮助。

要明确我想要的输出(或包含所有可能性的类似内容): [ ('the fashion potential hq ', '116', ' w 23rd ', 'st', 'ste 5 5th floor new york ny 10011'), ('the fashion potential hq ', '116', ' w 23rd st ', 'ste', ' 5 5th floor new york ny 10011')]

Online Python code

1 个答案:

答案 0 :(得分:0)

你需要运行2个正则表达式,一个带有懒点,另一个带有一个贪婪点。

第一个是this

^(.*?)(\b\d+\b)(.+)\b(ste|st|ave|blvd)\b\s*(.*)$

第二个使用懒惰点匹配模式:

^(.*?)(\b\d+\b)(.+?)\b(ste|st|ave|blvd)\b\s*(.*)$
                ^^^    ^^^^^^^^^^^^^^^

请参阅regex demo

输出:

the fashion potential hq 
116
 w 23rd 
st
ste 5 5th floor new york ny 10011

Python sample code

import re
p = re.compile(r'^(.*?)(\b\d+\b)(.+?)\b(ste|st|ave|blvd)\b\s*(.*)$')
p2 = re.compile(r'^(.*?)(\b\d+\b)(.+)\b(ste|st|ave|blvd)\b\s*(.*)$')
s = "the fashion potential hq 116 w 23rd st ste 5 5th floor new york ny 10011"
m = p.search(s)
if m:
    n = p2.search(s)
    if n:
        print([m.groups(), n.groups()])

结果:

[
   ('the fashion potential hq ', '116', ' w 23rd ', 'st', 'ste 5 5th floor new york ny 10011'), 
   ('the fashion potential hq ', '116', ' w 23rd st ', 'ste', '5 5th floor new york ny 10011')
 ]