我正在努力弄清楚我需要告诉Beautiful Soup删除标记“金额”值(在此代码示例中为“ 1,56”)是什么元素。
我将要粘贴的网页的代码摘录粘贴到下面:
<td class="line-content">
<span class="html-tag">
<div
<span class="html-attribute-name">
class
</span>
='
<span class="html-attribute-value">
the-price
</span>
'
<span class="html-attribute-name">
style
</span>
='
<span class="html-attribute-value">
margin-top:20px;
</span>
'>
</span>
</td>
</tr>
<tr>
<td class="line-number" value="447">
</td>
<td class="line-content">
<span class="html-tag">
<span
<span class="html-attribute-name">
class
</span>
='
<span class="html-attribute-value">
currency
</span>
'>
</span>
€
<span class="html-tag">
</span>
</span>
<span class="html-tag">
<span
<span class="html-attribute-name">
class
</span>
='
<span class="html-attribute-value">
amount
</span>
'>
</span>
1,56
<span class="html-tag">
</span>
</span>
</td>
</tr>
您能请我启发一下吗? 我真的很感谢您的帮助。
答案 0 :(得分:1)
例如,您可以定位金额(std::array
是您的HTML字符串):
data
打印:
from bs4 import BeautifulSoup
soup = BeautifulSoup(data, 'html.parser')
span_with_amount = soup.find(lambda tag: tag.name == 'span' and tag.get_text(strip=True) == 'amount')
value = span_with_amount.parent.find_next_sibling(text=True)
print(value.strip())
首先,我们将找到带有文本“金额”的1,56
,然后我们会找到该<span>
的父项旁边的文本。