我有一个html页面
我要提取所有标签“ href”的属性值。
下面是html页面:
<div class="universal">
<div class="slider">
<a class="focus" href="/1295%2C"><div><div><div>St</div></div></div></a>,
<a class="focus" href="/2395%2C"><div><div><div>GT</div></div></div></a>
</div>
<div class="slider">
<a class="focus" href="/3495%2C"><div><div><div>KT</div></div></div></a>,
<a class="focus" href="/4595%2C"><div><div><div>LT</div></div></div></a>
</div>
<div class="slider">
<a class="focus" href="/5695%2C"><div><div><div>OT</div></div></div></a>,
<a class="focus" href="/6795%2C"><div><div><div>OT</div></div></div></a>,
<a class="focus" href="/7895%2C"><div><div><div>OT</div></div></div></a>
</div>
我尝试了以下代码:
from bs4 import BeautifulSoup
response = html_page
html_text = BeautifulSoup(response, "html.parser")
shows = html_text.find('div', {'class': 'slider'}).findAll('a', {'class': 'focus'})
urls = []
for a_tag in shows :
urls.append(a_tag.find('a', {'class': 'focus'}).attrs['href'])
print urls
它给出None类型的对象没有属性'findAll' 请帮助
答案 0 :(得分:0)
这是使用find_all
的一种方法。
演示:
from bs4 import BeautifulSoup
html_text = BeautifulSoup(html, "html.parser")
shows = html_text.find_all('div', {'class': 'slider'})
urls = []
for div in shows:
for a_tag in div.find_all('a', {'class': 'focus'}):
urls.append(a_tag.attrs['href'])
print urls
输出:
[u'/1295%2C', u'/2395%2C', u'/3495%2C', u'/4595%2C', u'/5695%2C', u'/6795%2C', u'/7895%2C']