用于查找联系页面的正则表达式

时间:2016-10-06 22:28:40

标签: html regex

我正在尝试创建常规表达式,如果将href命名为contact,则可以从<a>属性获取url。我已经创建了这样的常规表达:

(?<=href=").*?(?=".*>[Cc]ontact)

如果每个href都在这样的新行中,它工作正常:

<div class="collapse navbar-collapse" role="navigation">
<ul class="navbar-right nav navbar-nav">
<li><a href="http://www.test.com/page1">Page1</a></li>
<li><a href="http://www.test.com/page2">Page2</a></li>
<li><a href="http://www.test.com/page3">Page3</a></li></ul></li>
<li><a href="http://www.test.com/contact">Contact</a></li>
<li><a href="http://www.test.com/page4">Page4<span class="caret"></span></a>
</ul>
</div>

结果:

http://www.test.com/contact

但是,如果格式不是很好,并且更多href s在一行中,它会找到所有网址,而不仅仅是联系网址。我该如何解决?

<div class="collapse navbar-collapse" role="navigation"><ul class="navbar-right nav navbar-nav"><li><a href="http://www.test.com/page1">Page1</a></li><li><a href="http://www.test.com/page2">Page2</a></li><li><a href="http://www.test.com/page3">Page3</a></li></ul></li><li><a href="http://www.test.com/contact">Contact</a></li><li><a href="http://www.test.com/page4">Page4<span class="caret"></span></a></ul></div>

结果:

http://www.test.com/page1
http://www.test.com/page2
http://www.test.com/page3
http://www.test.com/contact

1 个答案:

答案 0 :(得分:0)

您可以使用此正则表达式:

(?<=href=")[^"]*?(?="[^<]*>[Cc]ontact)

说明:

(?<=href=") # look back for href="
[^"]*? #any character that isn't "
(?="[^<]*>[Cc]ontact) # look forward for any character that isn't a starting tag until find [Cc]ontact