如何使用BeautifulSoup在html中查找div的直接子代(而不是子代的子代)?

时间:2019-06-17 17:41:28

标签: python-3.x beautifulsoup html-parsing

标记:

<div class = "parent-div">
    <div class = "child-1">
        <div class = "child-1.1">
        </div>
    </div>
    <div class = "child-2">
        <div class = "child-2.1">
        </div>
    </div>
</div>

我想获取div [parent-div]的直接子代的列表

即列表为:

[div class = "child-1">
        <div class = "child-1.1">
        </div>
    </div>,<div class = "child-2">
        <div class = "child-2.1">
        </div>
    </div>]

我正在使用下面的BeautifulSoup代码:

page_soup = soup(page_html,"html.parser")
main_cont = page_soup.find('div',{'class':'parent-div'}).findAll('div')

此代码为我提供了所有div的列表:

[<div class = "child-1">
        <div class = "child-1.1">
        </div>
    </div>,<div class = "child-1.1">
        </div>,<div class = "child-2">
        <div class = "child-2.1">
        </div>
    </div>,<div class = "child-2.1">
        </div>]

如何获取父div的直属子列表?

2 个答案:

答案 0 :(得分:0)

您可以使用findChildren()方法获取子标签。

main_cont = soup.find('div',{'class':'parent-div'}).findChildren('div',recursive=False)

输出:

[<div class="child-1"><div class="child-1.1"></div></div>, <div class="child-2"><div class="child-2.1"> </div></div>]

答案 1 :(得分:0)

您可以使用CSS选择器轻松完成此操作。注意:使用Beautiful Soup 4.7+。具体来说,使用子组合器:https://developer.mozilla.org/en-US/docs/Web/CSS/Child_combinator

from bs4 import BeautifulSoup

html = """
<div class = "parent-div">
    <div class = "child-1">
        <div class = "child-1.1">
        </div>
    </div>
    <div class = "child-2">
        <div class = "child-2.1">
        </div>
    </div>
</div>
"""

soup = BeautifulSoup(html, 'html.parser')

print(soup.select('div.parent-div > *'))

输出

[<div class="child-1">\n<div class="child-1.1">\n</div>\n</div>, <div class="child-2">\n<div class="child-2.1">\n</div>\n</div>]