使用bs4 python查找带有某些子标签的标签

时间:2018-07-25 10:11:35

标签: python html beautifulsoup

我有以下格式的html。

<div class="consider">
    <div class="row">
         <p>Text1</p>
    </div>
</div>
<div class="consider">
    <h2>Hello</h2>
</div>
<div class="Consider">
    <div class="row">
        <p>Text2
    </div>
</div>

我只想在其子标记(div)的类为“行”的情况下获取标记div

2 个答案:

答案 0 :(得分:0)

这是您访问它的方式:

from bs4 import BeautifulSoup
content = '<div class="consider"><div class="row"><p>Text1</p></div></div><div class="consider"><h2>Hello</h2></div><div class="Consider"><div class="row"><p>Text2</p></div></div>'
soup = BeautifulSoup(content, 'lxml')
for div in soup.find_all('div', class_='row'):
    if div.parent.name == "div":
        #do whatever you want with div.parent which is the element you want.

答案 1 :(得分:0)

使用select('div > div.row'),我们将所有具有class row的div标签选择为div标签的直接子元素,然后通过列表理解,选择这些标签的所有父元素:

data = '<div class="consider"><div class="row"><p>Text1</p></div></div><div class="consider"><h2>Hello</h2></div><div class="Consider"><div class="row"><p>Text2</p></div></div>'

from bs4 import BeautifulSoup    
soup = BeautifulSoup(data, 'lxml')

divs = [div.parent for div in soup.select('div > div.row')]

print(divs)

输出:

[<div class="consider"><div class="row"><p>Text1</p></div></div>, <div class="Consider"><div class="row"><p>Text2</p></div></div>]