这是我的一段名为汤的HTML代码。它已经是一个BeautifulSoup对象
<center>
<!--[if lt IE 7]>
<style type="text/css">
div, img { behavior: url(http://www.addic7ed.com/js/iepngfix.htc) }
</style>
<![endif]-->
<br /><center>
<!--Iframe Tag -->
<!-- begin ZEDO for channel: Addic7ed 728x90 , publisher: Addic7ed , Ad Dimension: Super Banner - 728 x 90 -->
<iframe src="http://d2.zedo.com/jsc/d2/ff2.html?n=2051;c=59;s=22;d=14;w=728;h=90" frameborder="0" marginheight="0" marginwidth="0" scrolling="no" allowtransparency="true" width="728" height="90"></iframe>
<!-- end ZEDO for channel: Addic7ed 728x90 , publisher: Addic7ed , Ad Dimension: Super Banner - 728 x 90 -->
</center><br /><div id="container">
<table class="tabel70" border="0"><tr><!-- table header --><td class="tablecorner"><img src="http://www.addic7ed.com/images/tl.gif" /></td>
<td></td>
<td class="tablecorner"><img src="http://www.addic7ed.com/images/tr.gif" /></td>
</tr><tr><td></td>
<td>
<form action="/search.php" method="get">
<div align="center">
<input name="search" type="text" id="search" size="50" value="nikita 03x02" class="inputCool" /> 
<input name="Submit" type="submit" class="coolBoton" value="Search" /><br /><b>1 results found</b> </div><br /><center><br /><form action="https://www.paypal.com/cgi-bin/webscr" method="post">
<input type="hidden" name="cmd" value="_s-xclick" /><input type="hidden" name="hosted_button_id" value="EC7EPAVR5MXV6" /><input type="image" src="https://www.paypal.com/en_US/i/btn/btn_donateCC_LG.gif" border="0" name="submit" alt="PayPal - The safer, easier way to pay online!" /><img alt="" border="0" src="https://www.paypal.com/en_US/i/scr/pixel.gif" width="1" height="1" /></form> <br /></center>
<br /><center>
<!--Iframe Tag -->
<!-- begin ZEDO for channel: Addic7ed 728x90 , publisher: Addic7ed , Ad Dimension: Super Banner - 728 x 90 -->
<iframe src="http://d2.zedo.com/jsc/d2/ff2.html?n=2051;c=59;s=22;d=14;w=728;h=90" frameborder="0" marginheight="0" marginwidth="0" scrolling="no" allowtransparency="true" width="728" height="90"></iframe>
<!-- end ZEDO for channel: Addic7ed 728x90 , publisher: Addic7ed , Ad Dimension: Super Banner - 728 x 90 -->
</center>
<br /><table class="tabel" align="center" width="80%" border="0"><tr><td><img src="images/television.png" /></td><td><a href="serie/Nikita/3/2/Innocence" debug="68217">Nikita - 03x02 - Innocence</a></td></tr><tr><p>
</p><p>
</p></tr></table></form></td>
<td></td>
</tr><tr><!-- table footer --><td class="tablecorner"><img src="http://www.addic7ed.com/images/bl.gif" /></td>
<td></td>
<td class="tablecorner"><img src="http://www.addic7ed.com/images/br.gif" /></td>
</tr></table></div>
我想使用BeautifulSoup和python
从class = tabel获取href(即&#34; serie / Nikita / 3/2 / Innocence&#34;)目前我可以使用
提取它soup.find(attrs = {'class':'tabel'}).find('a')['href']
但这似乎有点令人费解。是否有更简单(pyhonic)的方式来获取此URL?
干杯
答案 0 :(得分:2)
试试这个 -
page = urllib2.urlopen(url).read()
link_pat = SoupStrainer('a')
links = BeautifulSoup(page, parseOnlyThese=link_pat)
for link in links:
url = link['href'].strip('/')