Python:在div中提取hrefs

时间:2014-07-16 05:30:40

标签: python beautifulsoup

我正在尝试使用Python编写一个简单的下载实用程序,这是我尚未开发的语言。脚本应该在特定的div id中查找hrefs,只要有href,就应该调用getfile()函数。这是示例html源代码 -

<div class="tab-pane fade in active" id="home">
    <p><i class="icon-film icon-white"> <a target="_blank" href="/accounting?id=265">Video</a></i></p>
    <p><i class="icon-file icon-white"> <a target="_blank" href="/downloadpdf?id=265&type=pdf">&nbsp;PDF Slides</a></i></p>
    <p><i class="icon-download icon-white"> <a target="_blank" href="/downloadpdf?id=265&type=file">Additional Files</a></i></p>
</div>

我打算使用beautifulsoup模块来解析和提取href。我目前只有这样的事情 -

f = urllib2.urlopen(url)
s = f.read()
soup = bs4.BeautifulSoup(s)
for a in soup.select('div.home'):
    print a.attrs.get('href')

目前打印None

1 个答案:

答案 0 :(得分:1)

查找班级href内的所有tab-pane fade in active

soup = BeautifulSoup(st)                                             
for a in soup.findAll('div', {"class":"tab-pane fade in active"}):   
    for b in a.findAll('a'):                                         
        print b.get('href')

<强>输出

/accounting?id=265
/downloadpdf?id=265&type=pdf
/downloadpdf?id=265&type=file