python regex用匹配的字符串替换所有出现

时间:2016-04-14 17:54:36

标签: python regex

我有一个文件,我试图在jinja模板中显示它。我正在尝试将此negxxx string xxxneg的字符串替换为<span class="SomeCssClass_neg_xxx"> string </span>。问题在于我正在使用的匹配组编号\1。我知道我有多场比赛不仅1.需要一些帮助。

import re
StringIn = 'negxxx data1 xxxneg  out of span negxxx data2 xxxneg negzzz data1 zzzneg  out of span negzzz data2 zzzneg'
StringIn = re.sub(r"negxxx(.*)xxxneg", r"<span class='neg_xxx'>\1</span>" , StringIn)
StringIn = re.sub(r"negzzz(.*)zzzneg", r"<span class='neg_zzz'>\1</span>" , StringIn)
print StringIn

我明白了:

<span class='neg_xxx'> data1 xxxneg  out of span negxxx data2 </span> <span class='neg_zzz'> data1 zzzneg  out of span negzzz data2 </span>

这是不正确的,我需要的是:

<span class='neg_xxx'> data1 </span>   out of span <span class='neg_xxx'> data2 </span><span class='neg_zzz'> data1 </span>  out of span <span class='neg_zzz'> data2 </span>

1 个答案:

答案 0 :(得分:0)

你的.*正在竞争到字符串的末尾,只备份到最近的(到字符串结尾)&#34; xxxneg&#34;。使用惰性量词.*?,它一次只吃一个字符然后尝试匹配模式的其余部分:

import re
StringIn = 'negxxx data1 xxxneg  out of span negxxx data2 xxxneg negzzz data1 zzzneg  out of span negzzz data2 zzzneg'
StringIn = re.sub(r"negxxx(.*?)xxxneg", r"<span class='neg_xxx'>\1</span>" , StringIn)
StringIn = re.sub(r"negzzz(.*?)zzzneg", r"<span class='neg_zzz'>\1</span>" , StringIn)
print StringIn