在Beautifulsoup中仅将内容更改为父元素文本

时间:2015-02-17 14:15:46

标签: python beautifulsoup

我有这段代码:

txt = """<p>Hi <span>Mark</span>, how are you?, Don't forget meeting on <strong>sunday</strong>, ok?</p>"""
soup = BeautifulSoup(txt)
for ft in soup.findAll('p'):
        print str(ft).upper()

跑步时我明白了:

<P>HI <SPAN>MARK</SPAN>, HOW ARE YOU?, DON'T FORGET MEETING ON <STRONG>SUNDAY</STRONG>, OK?</P>

但我希望得到这个:

<p>HI <span>Mark</span>, HOW ARE YOU?, DON'T FORGET MEETING ON <strong>sunday<strong>, ok?</p>

我只想更改p标签上的内部文本,但是将格式保留在p内的其他内部标签中,我也希望将标签名称保留为小写

感谢名单

1 个答案:

答案 0 :(得分:1)

您可以将修改后的文本分配给代码string的{​​{1}}属性。因此,循环遍历p.string标记的所有内容,并使用正则表达式模块检查它是否包含标记符号<p><并跳过它们。类似的东西:

>

我使用from bs4 import BeautifulSoup import re txt = """<p>Hi <span>Mark</span>, how are you?, Don't forget meeting on <strong>sunday</strong>, ok?</p>""" soup = BeautifulSoup(txt) for p in soup.find_all('p'): p.string = ''.join( [str(t).upper() if not re.match(r'<[^>]+>', str(t)) else str(t) for t in p.contents]) print soup.prettify(formatter=None) 选项来避免formatter特殊符号的编码。它产生:

html