标记:
<div class = "parent-div">
<div class = "child-1">
<div class = "child-1.1">
</div>
</div>
<div class = "child-2">
<div class = "child-2.1">
</div>
</div>
</div>
我想获取div [parent-div]的直接子代的列表
即列表为:
[div class = "child-1">
<div class = "child-1.1">
</div>
</div>,<div class = "child-2">
<div class = "child-2.1">
</div>
</div>]
我正在使用下面的BeautifulSoup代码:
page_soup = soup(page_html,"html.parser")
main_cont = page_soup.find('div',{'class':'parent-div'}).findAll('div')
此代码为我提供了所有div的列表:
[<div class = "child-1">
<div class = "child-1.1">
</div>
</div>,<div class = "child-1.1">
</div>,<div class = "child-2">
<div class = "child-2.1">
</div>
</div>,<div class = "child-2.1">
</div>]
如何获取父div的直属子列表?
答案 0 :(得分:0)
您可以使用findChildren()
方法获取子标签。
main_cont = soup.find('div',{'class':'parent-div'}).findChildren('div',recursive=False)
输出:
[<div class="child-1"><div class="child-1.1"></div></div>, <div class="child-2"><div class="child-2.1"> </div></div>]
答案 1 :(得分:0)
您可以使用CSS选择器轻松完成此操作。注意:使用Beautiful Soup 4.7+。具体来说,使用子组合器:https://developer.mozilla.org/en-US/docs/Web/CSS/Child_combinator。
from bs4 import BeautifulSoup
html = """
<div class = "parent-div">
<div class = "child-1">
<div class = "child-1.1">
</div>
</div>
<div class = "child-2">
<div class = "child-2.1">
</div>
</div>
</div>
"""
soup = BeautifulSoup(html, 'html.parser')
print(soup.select('div.parent-div > *'))
输出
[<div class="child-1">\n<div class="child-1.1">\n</div>\n</div>, <div class="child-2">\n<div class="child-2.1">\n</div>\n</div>]