我试图在find_all方法之后使用.get_text()方法在网站的div和meta标签中获取内容,如下所示:
from bs4 import BeautifulSoup as soup
#skipped some lines
names = bs_obj.find_all("div", {'class':'classname'})
for name in names:
print(name.get_text()+"\n")
假设div标签的内容是
<div class="classname">content1</div>
<div class="classname">content2</div>
我的预期结果是
content1
content2
但是实际输出是
<div class="classname">content1</div>
<div class="classname">content2</div>
我尝试了一些方法,例如split()
,replace()
,re.search()
,但这些标记不会消失。知道发生了什么吗?
答案 0 :(得分:1)
您几乎得到了结果:
html_doc = """
<div class="classname">content1</div>
<div class="classname">content2</div>
"""
from bs4 import BeautifulSoup as soup
bs_obj = soup(html_doc, 'html.parser')
names = bs_obj.findAll('div', {'class':'classname'})
for name in names:
print(name.text)