我正在尝试创建常规表达式,如果将href
命名为contact,则可以从<a>
属性获取url。我已经创建了这样的常规表达:
(?<=href=").*?(?=".*>[Cc]ontact)
如果每个href
都在这样的新行中,它工作正常:
<div class="collapse navbar-collapse" role="navigation">
<ul class="navbar-right nav navbar-nav">
<li><a href="http://www.test.com/page1">Page1</a></li>
<li><a href="http://www.test.com/page2">Page2</a></li>
<li><a href="http://www.test.com/page3">Page3</a></li></ul></li>
<li><a href="http://www.test.com/contact">Contact</a></li>
<li><a href="http://www.test.com/page4">Page4<span class="caret"></span></a>
</ul>
</div>
结果:
http://www.test.com/contact
但是,如果格式不是很好,并且更多href
s在一行中,它会找到所有网址,而不仅仅是联系网址。我该如何解决?
<div class="collapse navbar-collapse" role="navigation"><ul class="navbar-right nav navbar-nav"><li><a href="http://www.test.com/page1">Page1</a></li><li><a href="http://www.test.com/page2">Page2</a></li><li><a href="http://www.test.com/page3">Page3</a></li></ul></li><li><a href="http://www.test.com/contact">Contact</a></li><li><a href="http://www.test.com/page4">Page4<span class="caret"></span></a></ul></div>
结果:
http://www.test.com/page1
http://www.test.com/page2
http://www.test.com/page3
http://www.test.com/contact
答案 0 :(得分:0)
您可以使用此正则表达式:
(?<=href=")[^"]*?(?="[^<]*>[Cc]ontact)
说明:
(?<=href=") # look back for href="
[^"]*? #any character that isn't "
(?="[^<]*>[Cc]ontact) # look forward for any character that isn't a starting tag until find [Cc]ontact