Python使用正则表达式替换短代码

时间:2014-09-13 13:17:12

标签: regex python-2.7 text replace shortcode

我有一个看起来像这样的字符串:

my_str = "This sentence has a [b|bolded] word, and [b|another] one too!"

我需要将它转换成这个:

new_str = "This sentence has a <b>bolded</b> word, and <b>another</b> one too!"

是否可以使用Python的string.replacere.sub方法智能地执行此操作?

3 个答案:

答案 0 :(得分:1)

试试这个表达式:[[]b[|](\w+)[]]缩短版也可以是\[b\|(\w+)\]

表达式搜索以[b|开头的任何内容时,使用]捕获它与结束\w+之间的内容,这意味着[a-zA-Z0-9_]包含更广泛的范围您还可以使用.*?代替\w+的字符,\[b\|(.*?)\]

Online Demo

示例演示:

import re
p = re.compile(ur'[[]b[|](\w+)[]]')
test_str = u"This sentence has a [b|bolded] word, and [b|another] one too!"
subst = u"<bold>$1</bold>"

result = re.sub(p, subst, test_str)

<强>输出:

This sentence has a <bold>bolded</bold> word, and <bold>another</bold> one too!

答案 1 :(得分:1)

只需将|[]之前的所有字符捕获到一个组中即可。并且|之后的部分进入另一个组。只需通过替换部件中的反向引用来调用捕获的组,即可获得所需的输出。

<强>正则表达式:

\[([^\[\]|]*)\|([^\[\]]*)\]

Replacemnet string:

<\1>\2</\1>

DEMO

>>> import re
>>> s = "This sentence has a [b|bolded] word, and [b|another] one too!"
>>> m = re.sub(r'\[([^\[\]|]*)\|([^\[\]]*)\]', r'<\1>\2</\1>', s)
>>> m
'This sentence has a <b>bolded</b> word, and <b>another</b> one too!'

Explanation...

答案 2 :(得分:0)

仅供参考,以防您不想要两个问题:

快速回答您的特定问题:

my_str = "This sentence has a [b|bolded] word, and [b|another] one too!"

print my_str.replace("[b|", "<b>").replace("]", "</b>")
# output:
# This sentence has a <b>bolded</b> word, and <b>another</b> one too!

这有一个缺陷,即无论是否合适,它都会将所有]替换为</b>。所以你可能想要考虑以下几点:

概括并将其包装在函数

def replace_stuff(s, char):
    begin = s.find("[{}|".format(char))
    while begin != -1:
        end = s.find("]", begin)
        s = s[:begin] + s[begin:end+1].replace("[{}|".format(char),
            "<{}>".format(char)).replace("]", "</{}>".format(char)) + s[end+1:]
        begin = s.find("[{}|".format(char))
    return s

例如

s = "Don't forget to [b|initialize] [code|void toUpper(char const *s)]."

print replace_stuff(s, "code")
# output: 
# "Don't forget to [b|initialize] <code>void toUpper(char const *s)</code>."