嗨,我有正则表达式的麻烦。
这是一些来源:
<div class="resultHeader googleHeader">
Wyniki z Google
</div>
<div class="boxResult2 ">
<div class="box ">
<div class="result">
<div class="link"> <a href="http://www.google.com/glass/start/"><b>Google Glass</b></a> </div>
<div class="source">
http://www.google.com/glass/start/
- <a rel="nofollow" href="query.html?hl=pl&qt=related:http%3A%2F%2Fwww.google.com%2Fglass%2Fstart%2F">Podobne strony</a>
</div><!-- source END -->
<div class="desc">Thanks for exploring with us. The journey doesn't end here. You'll start to see <br />
future versions of <b>Glass</b> when they're ready (for now, no peeking).</div>
</div><!-- result End -->
</div><!-- box End -->
<div class="box ">
<div class="result">
<div class="link"> <a href="http://pl.wikipedia.org/wiki/Google_Glass"><b>Google Glass</b> – Wikipedia, wolna encyklopedia</a> </div>
<div class="source">
http://pl.wikipedia.org/wiki/Google_Glass
- <a rel="nofollow" href="query.html?hl=pl&qt=related:http%3A%2F%2Fpl.wikipedia.org%2Fwiki%2FGoogle_Glass">Podobne strony</a>
</div><!-- source END -->
<div class="desc"><b>Google Glass</b> to okulary o rozszerzonej rzeczywistości stworzone przez firmę <br />
Google. Okulary te mają docelowo mieć funkcje standardowego smartfona, ale ...</div>
</div><!-- result End -->
</div><!-- box End -->
我想要<a href=" and ">
之间的链接 - 就像这样:
http://www.google.com/glass/start/
我写了这个.. '<div class="link"> <a href="([^ ]+)"'
但是没有用.. :(
答案 0 :(得分:3)
由于您使用Python编写此代码,我可以建议基于Beautiful Soup的解决方案。
from bs4 import BeautifulSoup
html = 'YOUR STRING'
soup = BeautifulSoup(html)
divs = soup.find_all("div", {"class":"link"})
for tag in divs:
a = tag.find_all("a")
for t in a:
if t.has_attr('href'):
print t['href']
根据您的样本输入,输出:
http://www.google.com/glass/start/
http://pl.wikipedia.org/wiki/Google_Glass