用不同的字符串替换每个匹配

时间:2017-05-04 09:54:31

标签: python python-2.7

我想将文本包含在链接标记中的以下字符串中。我用re.sub。它有效,但我还需要每个2个链接标签具有不同的ID。如何实现?

input = "<span>Replace this</span> and <span>this</span>"
result = re.compile(r'>(.*?)<', re.I).sub(r'><a id="[WHAT TO PUT HERE?]" class="my_class">\1</a><', input)

输出在链接标记处应具有不同的ID:

"<span><a id="id1" class="my_class">Replace this</a></span></span> and <span><a id="id2" class="my_class">this</a></span>"

1 个答案:

答案 0 :(得分:1)

正如ChristianKönig的链接所说,使用正则表达式解析HTML通常不是一个明智的想法。但是,如果您非常小心,如果HTML相对简单且稳定,可以有时可以使用它,但如果您正在解析的页面格式发生变化,那么您的代码可能会破坏。但无论如何......

上面给出的模式工作:它还将在"> and <"上执行替换。

这是一种做你想做的事的方法。我们使用函数作为repl arg到re.sub,我们给函数一个计数器(作为函数属性),因此它知道要使用的id号。每次更换时此计数器都会递增,但您可以在调用re.sub之前将计数器设置为您想要的任何值。

import re

pat = re.compile(r'<span>(.*?)</span>', re.I)

def repl(m):
    fmt = '<span><a id="id{}" class="my_class">{}</a></span>'
    result = fmt.format(repl.count, m.group(1))
    repl.count += 1
    return result
repl.count = 1

data = (
    "<span>Replace this</span> and <span>that</span>",
    "<span>Another</span> test <span>string</span> of <span>tags</span>",
)

for s in data:
    print('In : {!r}\nOut: {!r}\n'.format(s, pat.sub(repl, s)))

repl.count = 10
for s in data:
    print('In : {!r}\nOut: {!r}\n'.format(s, pat.sub(repl, s)))

<强>输出

In : '<span>Replace this</span> and <span>that</span>'
Out: '<span><a id="id1" class="my_class">Replace this</a></span> and <span><a id="id2" class="my_class">that</a></span>'

In : '<span>Another</span> test <span>string</span> of <span>tags</span>'
Out: '<span><a id="id3" class="my_class">Another</a></span> test <span><a id="id4" class="my_class">string</a></span> of <span><a id="id5" class="my_class">tags</a></span>'

In : '<span>Replace this</span> and <span>that</span>'
Out: '<span><a id="id10" class="my_class">Replace this</a></span> and <span><a id="id11" class="my_class">that</a></span>'

In : '<span>Another</span> test <span>string</span> of <span>tags</span>'
Out: '<span><a id="id12" class="my_class">Another</a></span> test <span><a id="id13" class="my_class">string</a></span> of <span><a id="id14" class="my_class">tags</a></span>'