我想使用BeautifulSoup将下面的html更改为通过注释标签ID取消注释。
<div class="foo">
cat dog sheep goat
<!--<p id="p1">test</p>-->
<p id="p2">
test
</p>
</div>
这是我下面的预期结果:
<div class="foo">
cat dog sheep goat
<p id="p1">test</p>
<p id="p2">
test
</p>
</div>
这是我使用BeautifulSoup的python代码,但我不知道如何完成此功能。
from bs4 import BeautifulSoup,Comment
data = """<div class="foo">
cat dog sheep goat
<p id='p1'>test</p>
<p id='p2'>test</p>
</div>"""
soup = BeautifulSoup(data, 'html.parser')
for comment in soup(text=lambda text: isinstance(text, Comment)):
if 'id="p1"' in comment.string:
# I don't know how to complete it here.
# This is my incorrect solution
# It will output "<p id="p1">test</p>",
# not "<p id='p1'>test</p>"
comment.replace_with(comment.string.replace("<!--", "").replace("-->", ""))
break
寻求帮助
答案 0 :(得分:2)
您可以将新汤而不是字符串放到.replace_with()
上
from bs4 import BeautifulSoup,Comment
data = """<div class="foo">
cat dog sheep goat
<!--<p id="p1">test</p>-->
<p id="p2">
test
</p>
</div>"""
soup = BeautifulSoup(data, 'html.parser')
print('Original soup:')
print('-' * 80)
print(soup)
print()
for comment in soup(text=lambda text: isinstance(text, Comment)):
if 'id="p1"' in comment.string:
tag = BeautifulSoup(comment, 'html.parser')
comment.replace_with(tag)
break
print('New soup:')
print('-' * 80)
print(soup)
print()
打印:
Original soup:
--------------------------------------------------------------------------------
<div class="foo">
cat dog sheep goat
<!--<p id="p1">test</p>-->
<p id="p2">
test
</p>
</div>
New soup:
--------------------------------------------------------------------------------
<div class="foo">
cat dog sheep goat
<p id="p1">test</p>
<p id="p2">
test
</p>
</div>
答案 1 :(得分:0)
您是否考虑过仅使用正则表达式而不是bs4?
也许这可以帮助您入门。
>>> re.search("<!--((.*)p1(.*))-->", '<!--<p id="p1">test</p>-->')
<re.Match object; span=(0, 26), match='<!--<p id="p1">test</p>-->'>
>>> re.search("<!--((.*)p1(.*))-->", '<!--<p id="p1">test</p>-->').group(1)
'<p id="p1">test</p>'
>>> regex = re.compile("<!--((.*)p1(.*))-->")
>>> regex.sub('<p id="p1">test</p>', '<!--<p id="p1">test</p>-->')
'<p id="p1">test</p>'