我有这样的代码:
<div class = first>
<div class = second>
<div class = fourth>
<div class = first>
<div class = second>
<div class = third>
<div class = first>
<div class = second>
<div class = fourth>
div第三个中的信息与div第四个不同,但是我需要第二个div中的信息,它位于div第三个位于同一个div中。所以div第三我需要确定正确的div秒。
更确切地说: 我需要执行以下操作:如果div.first包含div.third,则将此元素的特定div.second(或特定div)存储在变量中(以从中捕获一些文本)。
我已经尝试过find_all以及父母或子女操作,但我无法解决这个问题。 任何帮助都会非常感激。
更新(示例): 航班搜索站点的示例:
<div class = booking class>
<div class = price>
<div class = non refundable>
<div class = booking class>
<div class = price>
<div class = refundable>
<div class = booking class>
<div class = price>
<div class = non refundable>
答案 0 :(得分:0)
我认为当你说包含你的意思是价格div 是可退款的父母,所以寻找可退还的 div并查看父母是否为div类名价格你有你需要的东西:
html =“”“
<div class ="booking class">
<div class="price">
<div class="refundable"></div>
</div>
<div class = "booking class">
<div class="price">
<div class="non refundable"></div>
</div>
</div>"""
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, "lxml")
for div in soup.find_all( lambda tag: tag.name == "div" and tag.get('class') == ['refundable']):
par = div.parent
if par and par.name == "div" and par["class"] == ["price"]:
print(par)
如果他们是兄弟姐妹:
html = """
<div class ="booking class">
<div class="price"> </div>
<div class="refundable"></div>
</div>
<div class ="booking class">
<div class="price"> </div>
<div class="non refundable"></div>
</div>
<div class ="booking class">
<div class="price"> </div>
<div class="non refundable"></div>
</div>
"""
价格总是在使用之前 find_previous_sibling :
for div in soup.find_all(lambda tag: tag.name == "div" and tag.get('class') == ['refundable']):
sib = div.find_previous_sibling("div", "price")
if sib:
print(div.parent)
或在后面和/或之后检查:
for div in soup.find_all(lambda tag: tag.name == "div" and tag.get('class') == ['refundable']):
sib = div.find_previous_sibling("div", "price") or div.find_next_sibling("div", "price")
if sib:
print(div.parent)
如果div可以在父级内的任何位置:
for div in soup.find_all(lambda tag: tag.name == "div" and tag.get('class') == ['refundable']):
sib = div.parent.find("div", "price")
if sib:
print(div.parent)