我有一些HTML代码,我需要在类中为某些类别提取tittle和href。 html是:
<div class="submenu_img3" >
<ul class="submenu_list3 visible_false">
<li class="">
<input type="hidden" name="has_subcategories" value="0"/>
<input type="hidden" name="has_thirdlevel" value="0"/>
<input type="hidden" name="level" value="0"/>
<input type="hidden" name="posicion" value="0"/>
<a href="https://www.alimentacion.alcampo.es/tienda/index.php?cPath=2112_13_1302_6511">
<span class="txt" >
Cerdo selecta </span>
</a>
</li>
<li class="">
<input type="hidden" name="has_subcategories" value="0"/>
<input type="hidden" name="has_thirdlevel" value="0"/>
<input type="hidden" name="level" value="2"/>
<input type="hidden" name="posicion" value="1"/>
<a href="https://www.alimentacion.alcampo.es/tienda/index.php?cPath=2112_13_1302_130201">
<span class="txt" >
Cerdo Blanco </span>
</a>
</li>
<li class="">
<input type="hidden" name="has_subcategories" value="0"/>
<input type="hidden" name="has_thirdlevel" value="0"/>
<input type="hidden" name="level" value="2"/>
<input type="hidden" name="posicion" value="2"/>
<a href="https://www.alimentacion.alcampo.es/tienda/index.php?cPath=2112_13_1302_130202">
<span class="txt" >
Cerdo de Teruel </span>
</a>
</li>
<li class="">
<input type="hidden" name="has_subcategories" value="0"/>
<input type="hidden" name="has_thirdlevel" value="0"/>
<input type="hidden" name="level" value="2"/>
<input type="hidden" name="posicion" value="3"/>
<a href="https://www.alimentacion.alcampo.es/tienda/index.php?cPath=2112_13_1302_130203">
<span class="txt" >
Cerdo Ibérico </span>
</a>
</li>
但是使用这些代码我什么都得不到:
for row in soup.find_all('div',attrs={"class" : "submenu_img3"}, href=True):
print row.text
print row.a['href']
你能帮帮我吗?谢谢,抱歉我的英文!
答案 0 :(得分:2)
我猜你的意图是使用class submenu_img3获取所有div标签中所有标签的href和文本。 find_all的问题是href属性。代码要求beautifulsoup使用href属性查找所有div标签,但HTML中没有。
我发现使用允许CSS选择器的select调用更加容易。以下是查找class submenu_imgg3
的div标签内所有标签的代码soup = BeautifulSoup(html_doc, 'html.parser')
for row in soup.select('div.submenu_img3 a'):
print "Text:", row.text.strip()
print "Href:", row['href']
完整代码:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from bs4 import BeautifulSoup
html_doc = """
<div class="submenu_img3" >
<ul class="submenu_list3 visible_false">
<li class="">
<input type="hidden" name="has_subcategories" value="0"/>
<input type="hidden" name="has_thirdlevel" value="0"/>
<input type="hidden" name="level" value="0"/>
<input type="hidden" name="posicion" value="0"/>
<a href="https://www.alimentacion.alcampo.es/tienda/index.php?cPath=2112_13_1302_6511">
<span class="txt" > Cerdo selecta </span>
</a>
</li>
<li class="">
<input type="hidden" name="has_subcategories" value="0"/>
<input type="hidden" name="has_thirdlevel" value="0"/>
<input type="hidden" name="level" value="2"/>
<input type="hidden" name="posicion" value="1"/>
<a href="https://www.alimentacion.alcampo.es/tienda/index.php?cPath=2112_13_1302_130201">
<span class="txt" > Cerdo Blanco</span>
</a>
</li>
<li class="">
<input type="hidden" name="has_subcategories" value="0"/>
<input type="hidden" name="has_thirdlevel" value="0"/>
<input type="hidden" name="level" value="2"/>
<input type="hidden" name="posicion" value="2"/>
<a href="https://www.alimentacion.alcampo.es/tienda/index.php?cPath=2112_13_1302_130202">
<span class="txt" > Cerdo de Teruel </span>
</a>
</li>
<li class="">
<input type="hidden" name="has_subcategories" value="0"/>
<input type="hidden" name="has_thirdlevel" value="0"/>
<input type="hidden" name="level" value="2"/>
<input type="hidden" name="posicion" value="3"/>
<a href="https://www.alimentacion.alcampo.es/tienda/index.php?cPath=2112_13_1302_130203">
<span class="txt" > Cerdo Ibérico </span>
</a>
</li>
</ul>
</div>
"""
soup = BeautifulSoup(html_doc, 'html.parser')
for row in soup.select('div.submenu_img3 a'):
print "Text:", row.text.strip()
print "Href:", row['href']
请参阅CSS选择器的W3C链接。 CSS选择器非常强大
http://www.w3schools.com/cssref/css_selectors.asp