我正在尝试删除span标记中的span标记,但尚未找到解决方案。 脚本我尝试过如下:
request = 'http://urltargethere/adeas/asd'
r = urlopen(request).read()
sew = BeautifulSoup(r, 'lxml')
results = sew.findAll("span", {"class": "titles"})
for x in results:
print 'text ==> ', x
打印结果为:
<span class="titles"><span class="times">1 hour ago</span>Lorem ipsum dolor sit amet.</span>
<span class="titles"><span class="times">2 hour ago</span>Tara enim ad minim veniam.</span>
<span class="titles"><span class="times">3 hour ago</span>Morol eiusmodtempor incididunt.</span>
我正在寻找的结果是:
Lorem ipsum dolor sit amet.
Tara enim ad minim veniam.
Morol eiusmodtempor incididunt.
答案 0 :(得分:1)
这可能会有所帮助
<强>演示:强>
from bs4 import BeautifulSoup
a = '<span class="times">1 hour ago</span>Lorem ipsum dolor sit amet.'
soup = BeautifulSoup(a, 'html.parser')
for tag in soup.find_all("span", {'class':'times'}):
tag.replaceWith('')
print soup.get_text()
结果:
Lorem ipsum dolor sit amet.
答案 1 :(得分:1)
如果你只想要span标题类的最终文本,'。contents'将返回span的元素列表(时间跨度和文本),因此你可以索引你想要的那个:
from bs4 import BeautifulSoup
soup = BeautifulSoup('''\
<span class="title"><span class="times">1 hour ago</span>Lorem ipsum dolor sit amet.</span>
<span class="title"><span class="times">2 hour ago</span>Tara enim ad minim veniam.</span>
<span class="title"><span class="times">3 hour ago</span>Morol eiusmodtempor incididunt.</span>''','html.parser')
for s in soup.findAll('span',{'class':'title'}):
print(s.contents[1])
输出:
Lorem ipsum dolor sit amet.
Tara enim ad minim veniam.
Morol eiusmodtempor incididunt.
答案 2 :(得分:1)
尝试这样做可以摆脱你不想保留的部分:
content="""
<span class="title"><span class="times">1 hour ago</span>Lorem ipsum dolor sit amet.</span>
<span class="title"><span class="times">2 hour ago</span>Tara enim ad minim veniam.</span>
<span class="title"><span class="times">3 hour ago</span>Morol eiusmodtempor incididunt.</span>
"""
from bs4 import BeautifulSoup
soup = BeautifulSoup(content,"lxml")
for item in soup.find_all(class_="title"):
[tag.extract() for tag in item.find_all(class_="times")]
print(item.text)
输出:
Lorem ipsum dolor sit amet.
Tara enim ad minim veniam.
Morol eiusmodtempor incididunt.