如何使用Beautifulsoup获取Python中包含另一个特定div类的div类?

时间:2016-07-02 10:51:55

标签: python beautifulsoup

我有这样的代码:

<div class = first>
   <div class = second>
   <div class = fourth>
<div class = first>
   <div class = second>
   <div class = third> 
<div class = first>
   <div class = second>
   <div class = fourth>

div第三个中的信息与div第四个不同,但是我需要第二个div中的信息,它位于div第三个位于同一个div中。所以div第三我需要确定正确的div秒。

更确切地说: 我需要执行以下操作:如果div.first包含div.third,则将此元素的特定div.second(或特定div)存储在变量中(以从中捕获一些文本)。

我已经尝试过find_all以及父母或子女操作,但我无法解决这个问题。 任何帮助都会非常感激。

更新(示例): 航班搜索站点的示例:

<div class = booking class>
   <div class = price>
   <div class = non refundable>

<div class = booking class>
   <div class = price>
   <div class = refundable>

<div class = booking class>
   <div class = price>
   <div class = non refundable>
当div class =可退款时,我需要div class = price的内容。 希望这能使它更清楚。

1 个答案:

答案 0 :(得分:0)

我认为当你说包含你的意思是价格div 是可退款的父母,所以寻找可退还的 div并查看父母是否为div类名价格你有你需要的东西:

html =“”“                                       

<div class ="booking class">
   <div class="price">
        <div class="refundable"></div>
   </div>

<div class = "booking class">
   <div class="price">
         <div class="non refundable"></div>
   </div>
</div>"""

from bs4 import BeautifulSoup

soup = BeautifulSoup(html, "lxml")


for div in soup.find_all( lambda tag: tag.name == "div" and tag.get('class') == ['refundable']):
    par = div.parent
     if par and par.name == "div" and par["class"] == ["price"]:
        print(par)

如果他们是兄弟姐妹:

html = """
<div class ="booking class">
   <div class="price"> </div>
   <div class="refundable"></div>
 </div>

<div class ="booking class">
   <div class="price"> </div>
   <div class="non refundable"></div>
 </div>
 <div class ="booking class">
   <div class="price"> </div>
   <div class="non refundable"></div>
 </div>

"""

价格总是在使用之前 find_previous_sibling

for div in soup.find_all(lambda tag: tag.name == "div" and tag.get('class') == ['refundable']):
    sib = div.find_previous_sibling("div", "price")
    if sib:
       print(div.parent)

或在后面和/或之后检查:

for div in soup.find_all(lambda tag: tag.name == "div" and tag.get('class') == ['refundable']):
    sib = div.find_previous_sibling("div", "price") or div.find_next_sibling("div", "price")
    if sib:
       print(div.parent)    

如果div可以在父级内的任何位置:

for div in soup.find_all(lambda tag: tag.name == "div" and tag.get('class') == ['refundable']):
    sib = div.parent.find("div", "price")
    if sib:
        print(div.parent)