为项目替换<a> tags

时间:2019-05-27 09:00:45

标签: python regex django

example i have string

i need to replace 'game' word except in <a> tag

"<p class='glossary_hover'> cricket is good game <li> hello game </li>  <a href='https://game.in'> game of thrones </a> need to replace game word </p>"

result is

"<p class='glossary_hover'> cricket is good game2 <li> hello game2 </li>  <a href='https://game.in'> game of thrones </a> need to replace game2 word </p>"

where i replaced game to game2 word

1 个答案:

答案 0 :(得分:0)

使用re:

#!/usr/bin/env python3
import re

s = "<p class='glossary_hover'> cricket is good game <li> hello game </li>  <a href='https://game.in'> game of thrones </a> need to replace game word </p>"

mapping = {}

for a in re.findall("<a[^>]*>.*</a>", s):
    mapping[a.replace("game","game2")] = a

s = s.replace("game", "game2")

for a_game2, a_original in mapping.items():
    s = s.replace(a_game2, a_original)

print(s)

使用bs4:

#!/usr/bin/env python3
from bs4 import BeautifulSoup

s = "<p class='glossary_hover'> cricket is good game <li> hello game </li>  <a href='https://game.in'> game of thrones </a> need to replace game word </p>"

soup = BeautifulSoup(s, "html.parser")
mapping = {}

for a_tag in soup.find_all("a"):
    a = str(a_tag).replace("\"","'") # bs4 replaces single quotes with doubles
    mapping[a.replace("game","game2")] = a

s = s.replace("game", "game2")

for a_game2, a_original in mapping.items():
    s = s.replace(a_game2, a_original)

print(s)

说明: 为了示例,创建一个名为mapping的字典。.我们将所有内容存储在其中的a标记内。.键将game替换为game2。原始字符串。

这使我们可以在整个字符串上将game替换为game2,然后运行另一个替换操作,以放回先前在a标记中找到的所有内容。

两个脚本的结果相同:

<p class='glossary_hover'> cricket is good game2 <li> hello game2 </li> <a href='https://game.in'> game of thrones </a> need to replace game2 word </p>