我有这段代码:
txt = """<p>Hi <span>Mark</span>, how are you?, Don't forget meeting on <strong>sunday</strong>, ok?</p>"""
soup = BeautifulSoup(txt)
for ft in soup.findAll('p'):
print str(ft).upper()
跑步时我明白了:
<P>HI <SPAN>MARK</SPAN>, HOW ARE YOU?, DON'T FORGET MEETING ON <STRONG>SUNDAY</STRONG>, OK?</P>
但我希望得到这个:
<p>HI <span>Mark</span>, HOW ARE YOU?, DON'T FORGET MEETING ON <strong>sunday<strong>, ok?</p>
我只想更改p标签上的内部文本,但是将格式保留在p内的其他内部标签中,我也希望将标签名称保留为小写
感谢名单
答案 0 :(得分:1)
您可以将修改后的文本分配给代码string
的{{1}}属性。因此,循环遍历p.string
标记的所有内容,并使用正则表达式模块检查它是否包含标记符号<p>
和<
并跳过它们。类似的东西:
>
我使用from bs4 import BeautifulSoup
import re
txt = """<p>Hi <span>Mark</span>, how are you?, Don't forget meeting on <strong>sunday</strong>, ok?</p>"""
soup = BeautifulSoup(txt)
for p in soup.find_all('p'):
p.string = ''.join(
[str(t).upper()
if not re.match(r'<[^>]+>', str(t))
else str(t)
for t in p.contents])
print soup.prettify(formatter=None)
选项来避免formatter
特殊符号的编码。它产生:
html