Question

我遇到了尝试使用Python和Beautiful Soup通过HTML进行解析的问题，并且遇到了我想针对非常具体的数据进行解析的问题。这是我遇到的那种代码：

<div class="big_div">
   <div class="smaller div">
      <div class="other div">
         <div class="this">A</div>
         <div class="that">2213</div>
      <div class="other div">
         <div class="this">B</div>
         <div class="that">215</div>
      <div class="other div">
         <div class="this">C</div>
         <div class="that">253</div>

有一系列重复的HTML，您可以看到只有值不同，我的问题是找到一个特定的值。 我想在最后一个div中找到253。我将不胜感激，因为这是通过HTML解析时经常出现的问题。

提前谢谢！

到目前为止，我一直在尝试解析它，但是由于名称相同，我不知道如何浏览它。我也尝试过使用for循环，但是几乎没有进展。

Answer 1

您可以在查找中使用字符串属性作为参数。 BS docs for string attr.

"""Suppose html is the object holding html code of your web page that you want to scrape
and req_text is some text that you want to find"""
soup = BeautifulSoup(html, 'lxml')
req_div = soup.find('div', string=req_text)

req_div将包含您想要的div元素。

BS4的Python HTML解析

1 个答案: