如何使用BeautifulSoup获取深层嵌套的div值?

时间:2014-12-08 09:34:00

标签: python beautifulsoup

我需要在DOM结构中获取深层嵌套的<span>元素的值,如下所示:

<div class="panda">
    <div class="that">
        <ul class="foo">
            <li class="bar">
                <div class="hi">
                    <p class="bye">
                        <span class="cheese">Cheddar</span>

的问题

soup.findAll("span", {"class": "cheese"})

是页面上有数百个span元素,类为“cheese”,所以我需要按类“panda”过滤它们。我需要获取像["Cheddar", "Parmesan", "Swiss"]

这样的值列表

1 个答案:

答案 0 :(得分:2)

使用css选择器:

[e.get_text() for e in soup.select('.panda .cheese')]

或者,如果您更喜欢find_all

# Calling a soup or tag is the same as find_all

[e.get_text() for panda in soup('div', {'class': 'panda'}) 
              for e in panda('span', {'class': 'cheese'})]