Question

import re
def multiwordReplace(text, wordDic):
    rc = re.compile('|'.join(map(re.escape, wordDic))))
    def translate(match):
        return wordDic[match.group(0)]
    return rc.sub(translate, text)

此代码是从其他来源复制的，但我不确定如何替换文本段落中的单词，并且不明白为什么在这里使用're'功能

Answer 1

re.compile() - 将表达式字符串编译为regex对象。该字符串由worDic的分段键和分隔符|组成。给定wordDic {'hello':'hi', 'goodbye': 'bye'}字符串将是＆＃39; hello | hi＆＃39;可以转换为＆＃34; hello 或 hi＆＃34;
def translate(match): - 定义一个将处理每场比赛的回调函数
rc.sub(translate, text) - 执行字符串替换。如果正则表达式与文本匹配，则通过回调在wordDic中查找匹配（实际上是wordDic的键）并返回翻译。

示例：

wordDic = {'hello':'hi', 'goodbye': 'bye'}
text = 'hello my friend, I just wanted to say goodbye'
translated = multiwordReplace(text, wordDic)
print(translated)

输出是：

hi my friend, I just wanted to say bye

修改

使用re.compile()的主要优点是，如果多次使用正则表达式，性能会提高。由于正则表达式是在每个函数调用上编译的，因此没有增益。如果多次使用wordDic，则为multiwordReplace生成wordDic函数，编译只执行一次：

import re def generateMwR(wordDic): rc = re.compile('|'.join(map(re.escape, wordDic))) def f(text): def translate(match): print(match.group(0)) return wordDic[match.group(0)] return rc.sub(translate, text) return f

用法如下：

wordDic = {'hello': 'hi', 'goodbye': 'bye'} text = 'hello my friend, I just wanted to say goodbye' f = generateMwR(wordDic) translated = f(text)

Answer 2

一块一块地......

 # Our dictionary
 wordDic = {'hello': 'foo', 'hi': 'bar', 'hey': 'baz'}

 # Escape every key in dictionary with regular expressions' escape character. 
 # Escaping is requred so that possible special characters in 
 # dictionary words won't mess up the regex
 map(re.escape, wordDic)

 # join all escaped key elements with pipe | to make a string 'hello|hi|hey'
'|'.join(map(re.escape, wordDic))

 # Make a regular expressions instance with given string.
 # the pipe in the string will be interpreted as "OR", 
 # so our regex will now try to find "hello" or "hi" or "hey"
 rc = re.compile('|'.join(map(re.escape, wordDic)))

所以 rc 现在与字典中的单词匹配， rc.sub 替换给定字符串中的单词。翻译功能只返回键的相应值正则表达式返回一个匹配。

这个单词替换功能如何工作？

2 个答案: