正则表达式替换python中的字符列表

时间:2015-10-30 00:02:43

标签: python regex

我有一个字符列表,我想在字符串中找到它,并将它的多次出现一起替换为一次出现。

但是我遇到了两个问题 - 当我循环它们时,re.sub函数不会替换多次出现,当我有笑脸时:)它将':'替换为':)'我不想要

这是我尝试的代码。

end_of_line_chars = [".",";","!",":)",":-)","=)",":]",":-(",":(",":[","=(",":P",":-P",":-p",":p","=P"]
for i in end_of_line_chars:
    pattern = "[" + i + "]" + "+"
    str = re.sub(pattern,i,str)

如果我使用单个字符并尝试按下图所示工作。

str = re.sub("[.]+",".",str)

但循环遍历字符列表会出错。 如何解决这两个问题?谢谢你的帮助。

2 个答案:

答案 0 :(得分:1)

re.escape(str)为你逃跑。与|分开,您可以匹配替代品。使用(?:…),您可以在不捕获的情况下进行分组。所以:

# Only in Python2:
from itertools import imap as map, ifilter as filter

# Escape all elements for, e.g. ':-)' → r'\:\-\)':
esc = map(re.escape, end_of_line_chars)
# Wrap elements in capturing as group, so you know what element what found,
# and in a non-capturing group with repeats and optional trailing spaces:
esc = map(r'(?:({})\s*)+'.format, esc)
# Compile expressing what finds any of these elements:
esc = re.compile('|'.join(esc))

# The function to turn a match of repeats into a single item:
def replace_with_one(match):
    # match.groups() has captures, where only the found one is truthy: ()
    # e.g. (None, None, None, None, ':-)', None, None, None, None, None, None, None, None, None, None, None)
    return next(filter(bool, match.groups()))

# This is how you use it:
esc.sub(replace_with_one, '.... :-) :-) :-) :-( .....')
# Returns: '.:-):-(.'

答案 1 :(得分:0)

如果要替换的内容不是单个字符,则字符类不会起作用。相反,使用非捕获组(并使用re.escape,因此文字不会被解释为正则表达式特殊字符):

end_of_line_chars = [".",";","!",":)",":-)","=)",":]",":-(",":(",":[","=(",":P",":-P",":-p",":p","=P"]
for i in end_of_line_chars:
    pattern = r"(?:{})+".format(re.escape(i))
    str = re.sub(pattern,i,str)