美丽的汤4:如何用文本和另一个标签替换标签?

时间:2014-11-18 23:55:33

标签: python html replace beautifulsoup html-parsing

我想用另一个标签替换标签,并将旧标签的内容放在新标签之前。例如:

我想改变这个:

<html>
<body>
<p>This is the <span id="1">first</span> paragraph</p>
<p>This is the <span id="2">second</span> paragraph</p>
</body>
</html>

进入这个:

<html>
<body>
<p>This is the first<sup>1</sup> paragraph</p>
<p>This is the second<sup>2</sup> paragraph</p>
</body>
</html>

我可以使用spans轻松找到所有find_all(),从id属性中获取数字,并使用replace_with()将一个标记替换为另一个标记,但如何使用文本替换标记新标记或在替换标记之前插入文本?

1 个答案:

答案 0 :(得分:6)

我们的想法是找到包含span属性id CSS Selector}的每个span[id]代码,使用insert_after()插入sup代码在它和unwrap()之后用它的内容替换标签:

from bs4 import BeautifulSoup

data = """
<html>
<body>
<p>This is the <span id="1">first</span> paragraph</p>
<p>This is the <span id="2">second</span> paragraph</p>
</body>
</html>
"""

soup = BeautifulSoup(data)
for span in soup.select('span[id]'):
    # insert sup tag after the span
    sup = soup.new_tag('sup')
    sup.string = span['id']
    span.insert_after(sup)

    # replace the span tag with it's contents
    span.unwrap()

print soup

打印:

<html>
<body>
<p>This is the first<sup>1</sup> paragraph</p>
<p>This is the second<sup>2</sup> paragraph</p>
</body>
</html>