我正在尝试使用Python编写一个简单的下载实用程序,这是我尚未开发的语言。脚本应该在特定的div id中查找hrefs,只要有href,就应该调用getfile()函数。这是示例html源代码 -
<div class="tab-pane fade in active" id="home">
<p><i class="icon-film icon-white"> <a target="_blank" href="/accounting?id=265">Video</a></i></p>
<p><i class="icon-file icon-white"> <a target="_blank" href="/downloadpdf?id=265&type=pdf"> PDF Slides</a></i></p>
<p><i class="icon-download icon-white"> <a target="_blank" href="/downloadpdf?id=265&type=file">Additional Files</a></i></p>
</div>
我打算使用beautifulsoup模块来解析和提取href。我目前只有这样的事情 -
f = urllib2.urlopen(url)
s = f.read()
soup = bs4.BeautifulSoup(s)
for a in soup.select('div.home'):
print a.attrs.get('href')
目前打印None
答案 0 :(得分:1)
查找班级href
内的所有tab-pane fade in active
:
soup = BeautifulSoup(st)
for a in soup.findAll('div', {"class":"tab-pane fade in active"}):
for b in a.findAll('a'):
print b.get('href')
<强>输出强>
/accounting?id=265
/downloadpdf?id=265&type=pdf
/downloadpdf?id=265&type=file