这是我的HTML标签。我正在尝试获取<br>
标记后的值。当我尝试这样做时,我同时获得了两个值。我将如何使用美丽汤来做到这一点。任何帮助,将不胜感激。
<div class="col search_price discounted responsive_secondrow">
<span style="color: #888888;"><strike>CDN$ 2.29</strike></span>
<br>CDN$ 1.48
</div>
答案 0 :(得分:0)
您基本上已经拥有了它,只需要使用attrs词典作为正确的div类,然后搜索下一个'br'标签,其兄弟姐妹就是您的文本:
from bs4 import BeautifulSoup as bs
HTML = """
<div class="col search_price discounted responsive_secondrow">
<span style="color: #888888;"><strike>CDN$ 2.29</strike></span>
<br>CDN$ 1.48
</div>
"""
soup = bs(HTML, 'html.parser')
# get all divs with your class attr
divs = soup.find_all("div", attrs={'class': 'col search_price discounted responsive_secondrow'})
for div in divs:
# find the <br> tag, next_sibling is the data
print(div.find_next('br').next_sibling)
答案 1 :(得分:0)
其他解决方案。
from simplified_scrapy.simplified_doc import SimplifiedDoc
html='''
<div class="col search_price discounted responsive_secondrow">
<span style="color: #888888;"><strike>CDN$ 2.29</strike></span>
<br>CDN$ 1.48
</div>
'''
doc = SimplifiedDoc(html)
divs = doc.getElementsByClass('col search_price discounted responsive_secondrow')
for div in divs:
value = div.br.nextText() # first
print (value)
value = doc.html[div.br._end:div._end-6] # second
print (value)
value = doc.removeHtml(div.getSectionByReg('<br.*>.*')) # third
print (value)
value = div.removeElement('span') # fourth
print (value.text)
结果:
CDN$ 1.48
CDN$ 1.48
CDN$ 1.48
CDN$ 1.48