我想从DIV标签中提取内容。我使用scrapy来废弃一些网站,但问题是相同的DIV标签有两种类型的内容:
["<div class=\"price\">\n <s>Rs.330</s> <b>Rs.297</b>\n </div>"]
并且
["<div class=\"price\">\n Rs.330 \n</div>"]
如何从此标记中提取内容?
答案 0 :(得分:2)
import bs4
html = "<div class=\"price\">\n <s>Rs.330</s> <b>Rs.297</b>\n </div>"
soup = bs4.BeautifulSoup(html, features="xml")
s = soup.div.s.text # u'Rs.330'
b = soup.div.b.text # u'Rs.297'