Question

您好我正在尝试使用以下函数替换单个传递中的多个单词：

def multiple_replace(text, dict):
    regex = re.compile("(%s)" % "|".join(map(re.escape, dict.keys())))
    return regex.sub(lambda mo: dict[mo.string[mo.start():mo.end()]], text)

但我的问题是如果我有一本字典：

dict = { 'hello1': 'hi', 'hello111' : 'GoodMorning', 'world' : 'earth' }

我试试

s = " hello111 world"
multiple_replace(s, dict)

该函数按预期匹配hello1而非hello111 如果你们有任何领先的话会很棒！

我想要反转搜索以确保函数以最长的键开始，因为我的键已经排序，但这可能不是最好的方法。

Answer 1

WiktorStribiżew的评论权

首先对键进行排序 OR 给出单词边界。

def multiple_replace_sort(text, a_dict):
    regex = re.compile("(%s)" % "|".join(map(re.escape, sorted(a_dict, key=lambda obj: len(obj), reverse=True))))
    return regex.sub(lambda mo: a_dict[mo.string[mo.start():mo.end()]], text)


def multiple_replace_boundary(text, a_dict):
    regex = re.compile(r"(%s)\b" % "|".join(map(re.escape, a_dict.keys())))
    return regex.sub(lambda mo: a_dict[mo.string[mo.start():mo.end()]], text)

非单词项目可能不适合上述方法，必须先分开，或者可能是一些更好的代码来处理它。

def multiple_replace_separate(text, a_dict):
    word, non_word = list(), list()
    for key in a_dict:
        word.append(key) if len(re.match(r'([a-zA-Z0-9]*)', key).group(0)) == len(key) else non_word.append(key)
    regex = re.compile(r"(%s)\B|(%s)\b" % ("|".join(non_word), "|".join(map(re.escape, word))))
    return regex.sub(lambda mo: a_dict[mo.string[mo.start():mo.end()]], text)

字典匹配问题与单次传递多次替换

1 个答案: