多个正则表达式替换

时间:2014-12-23 07:40:34

标签: python regex

我使用以下代码来规范化文件的名称:

new_file = re.sub('[. ]', '_', old_file.lower())
new_file = re.sub('__+', '_', new_file)
new_file = re.sub('[][)(}{]',  '', new_file)
new_file = re.sub('[-_]([^-_]+)$',  r'.\1', new_file)

我的问题是有可能以更好的方式编写此代码吗?

我找到了following example

def multiple_replace(dict, text):
    # Create a regular expression  from the dictionary keys
    regex = re.compile("(%s)" % "|".join(map(re.escape, dict.keys())))

    # For each match, look-up corresponding value in dictionary
    return regex.sub(lambda mo: dict[mo.string[mo.start():mo.end()]], text) 

dict = {
    "Larry Wall" : "Guido van Rossum",
    "creator" : "Benevolent Dictator for Life",
    "Perl" : "Python",
} 

但此代码仅适用于普通字符串。第3行中的map(re.escape, ...“摧毁”正则表达式。

的问候,
射线

1 个答案:

答案 0 :(得分:3)

如果您只是寻找更易于维护且重复性更低的代码(而不是算法更改),请使用简单的for循环:

SUBS = [
  ('[. ]', '_'),
  ('__+', '_'),
  ('[][)(}{]',  ''),
  ('[-_]([^-_]+)$',  r'.\1'),
]

def normalize(name):
    name = name.lower()
    for pattern, replacement in SUBS:
        name = re.sub(pattern, replacement, name)
    return name