如何从美丽的汤中提取嵌套的span类值?

时间:2019-12-02 21:01:48

标签: python web-scraping beautifulsoup

我正在努力弄清楚我需要告诉Beautiful Soup删除标记“金额”值(在此代码示例中为“ 1,56”)是什么元素。

我将要粘贴的网页的代码摘录粘贴到下面:

<td class="line-content">
      <span class="html-tag">
       &lt;div
       <span class="html-attribute-name">
        class
       </span>
       ='
       <span class="html-attribute-value">
        the-price
       </span>
       '
       <span class="html-attribute-name">
        style
       </span>
       ='
       <span class="html-attribute-value">
        margin-top:20px;
       </span>
       '&gt;
      </span>
     </td>
    </tr>
    <tr>
     <td class="line-number" value="447">
     </td>
     <td class="line-content">
      <span class="html-tag">
       &lt;span
       <span class="html-attribute-name">
        class
       </span>
       ='
       <span class="html-attribute-value">
        currency
       </span>
       '&gt;
      </span>
      €
      <span class="html-tag">
       &lt;/span&gt;
      </span>
      <span class="html-tag">
       &lt;span
       <span class="html-attribute-name">
        class
       </span>
       ='
       <span class="html-attribute-value">
        amount
       </span>
       '&gt;
      </span>
      1,56
      <span class="html-tag">
       &lt;/span&gt;
      </span>
     </td>
    </tr>

您能请我启发一下吗? 我真的很感谢您的帮助。

1 个答案:

答案 0 :(得分:1)

例如,您可以定位金额(std::array是您的HTML字符串):

data

打印:

from bs4 import BeautifulSoup
soup = BeautifulSoup(data, 'html.parser')

span_with_amount = soup.find(lambda tag: tag.name == 'span' and tag.get_text(strip=True) == 'amount')
value = span_with_amount.parent.find_next_sibling(text=True)
print(value.strip())

首先,我们将找到带有文本“金额”的1,56 ,然后我们会找到该<span>的父项旁边的文本。