如何使用Python BeautifulSoup从某些html类中获取href

时间:2013-12-02 17:51:14

标签: python beautifulsoup

这是我的一段名为汤的HTML代码。它已经是一个BeautifulSoup对象

<center>

<!--[if lt IE 7]>
 <style type="text/css">
 div, img { behavior: url(http://www.addic7ed.com/js/iepngfix.htc) }
 </style>
<![endif]-->
<br /><center>
<!--Iframe Tag  -->

<!-- begin ZEDO for channel:  Addic7ed 728x90 , publisher: Addic7ed , Ad Dimension: Super Banner - 728 x 90 -->

<iframe src="http://d2.zedo.com/jsc/d2/ff2.html?n=2051;c=59;s=22;d=14;w=728;h=90" frameborder="0" marginheight="0" marginwidth="0" scrolling="no" allowtransparency="true" width="728" height="90"></iframe>

<!-- end ZEDO for channel:  Addic7ed 728x90 , publisher: Addic7ed , Ad Dimension: Super Banner - 728 x 90 -->
</center><br /><div id="container"> 
        <table class="tabel70" border="0"><tr><!-- table header --><td class="tablecorner"><img src="http://www.addic7ed.com/images/tl.gif" /></td>
                <td></td>
                <td class="tablecorner"><img src="http://www.addic7ed.com/images/tr.gif" /></td>
            </tr><tr><td></td>
                <td>
<form action="/search.php" method="get">
<div align="center">
<input name="search" type="text" id="search" size="50" value="nikita 03x02" class="inputCool" />&#160;
 <input name="Submit" type="submit" class="coolBoton" value="Search" /><br /><b>1 results found</b> </div><br /><center><br /><form action="https://www.paypal.com/cgi-bin/webscr" method="post">
    <input type="hidden" name="cmd" value="_s-xclick" /><input type="hidden" name="hosted_button_id" value="EC7EPAVR5MXV6" /><input type="image" src="https://www.paypal.com/en_US/i/btn/btn_donateCC_LG.gif" border="0" name="submit" alt="PayPal - The safer, easier way to pay online!" /><img alt="" border="0" src="https://www.paypal.com/en_US/i/scr/pixel.gif" width="1" height="1" /></form> <br /></center>
<br /><center>
<!--Iframe Tag  -->

<!-- begin ZEDO for channel:  Addic7ed 728x90 , publisher: Addic7ed , Ad Dimension: Super Banner - 728 x 90 -->

<iframe src="http://d2.zedo.com/jsc/d2/ff2.html?n=2051;c=59;s=22;d=14;w=728;h=90" frameborder="0" marginheight="0" marginwidth="0" scrolling="no" allowtransparency="true" width="728" height="90"></iframe>

<!-- end ZEDO for channel:  Addic7ed 728x90 , publisher: Addic7ed , Ad Dimension: Super Banner - 728 x 90 -->
</center>
<br /><table class="tabel" align="center" width="80%" border="0"><tr><td><img src="images/television.png" /></td><td><a href="serie/Nikita/3/2/Innocence" debug="68217">Nikita - 03x02 - Innocence</a></td></tr><tr><p>
</p><p>
</p></tr></table></form></td>
                <td></td>
            </tr><tr><!-- table footer --><td class="tablecorner"><img src="http://www.addic7ed.com/images/bl.gif" /></td>
                <td></td>
                <td class="tablecorner"><img src="http://www.addic7ed.com/images/br.gif" /></td>
            </tr></table></div>

我想使用BeautifulSoup和python

从class = tabel获取href(即&#34; serie / Nikita / 3/2 / Innocence&#34;)

目前我可以使用

提取它
soup.find(attrs = {'class':'tabel'}).find('a')['href']

但这似乎有点令人费解。是否有更简单(pyhonic)的方式来获取此URL?

干杯

1 个答案:

答案 0 :(得分:2)

试试这个 -

page = urllib2.urlopen(url).read()
link_pat = SoupStrainer('a')
links = BeautifulSoup(page, parseOnlyThese=link_pat)
for link in links:
    url = link['href'].strip('/')