我需要在DOM结构中获取深层嵌套的<span>
元素的值,如下所示:
<div class="panda">
<div class="that">
<ul class="foo">
<li class="bar">
<div class="hi">
<p class="bye">
<span class="cheese">Cheddar</span>
的问题
soup.findAll("span", {"class": "cheese"})
是页面上有数百个span元素,类为“cheese”,所以我需要按类“panda”过滤它们。我需要获取像["Cheddar", "Parmesan", "Swiss"]
答案 0 :(得分:2)
使用css选择器:
[e.get_text() for e in soup.select('.panda .cheese')]
或者,如果您更喜欢find_all
:
# Calling a soup or tag is the same as find_all
[e.get_text() for panda in soup('div', {'class': 'panda'})
for e in panda('span', {'class': 'cheese'})]