假设我有
text = """ <a href = 'http://www.crummy.com/software'>Hello There</a>"""
我想用一个空格(“”)替换一个hrefs和/ a。取而代之。 BTW它是一个BeautifulSoup.BeautifulSoup类。所以正常的.replace是行不通的。
我希望文字只是
""" Hello There """
注意“你好”之前和之后的空格。
答案 0 :(得分:3)
您可以使用replaceWith()
(或replace_with()
):
from bs4 import BeautifulSoup
soup = BeautifulSoup("""
<html>
<body>
<a href = 'http://www.crummy.com/software'>Hello There</a>
</body>
</html>
""")
for a in soup.findAll('a'):
a.replaceWith(" %s " % a.string)
print soup
打印:
<html><body>
Hello There
</body></html>
答案 1 :(得分:2)
使用.replace_with()
和.text
属性:
>>> from bs4 import BeautifulSoup as BS
>>> text = """ <a href = 'http://www.crummy.com/software'>Hello There</a>"""
>>> soup = BS(text)
>>> mytag = soup.find('a')
>>> mytag.replace_with(mytag.text + ' ')
<a href="http://www.crummy.com/software">Hello There</a>
>>> print soup
Hello There
答案 2 :(得分:-1)
import re
notag = re.sub("<.*?>", " ", html)
>>> text = """ <a href = 'http://www.crummy.com/software'>Hello There</a>"""
>>> notag = re.sub("<.*?>", " ", text)
>>> notag
' Hello There '