这个单词替换功能如何工作?

时间:2016-05-13 10:12:49

标签: python regex function python-3.x

import re
def multiwordReplace(text, wordDic):
    rc = re.compile('|'.join(map(re.escape, wordDic))))
    def translate(match):
        return wordDic[match.group(0)]
    return rc.sub(translate, text)

此代码是从其他来源复制的,但我不确定如何替换文本段落中的单词,并且不明白为什么在这里使用're'功能

2 个答案:

答案 0 :(得分:1)

  1. re.compile() - 将表达式字符串编译为regex对象。该字符串由worDic的分段键和分隔符|组成。给定wordDic {'hello':'hi', 'goodbye': 'bye'}字符串将是' hello | hi'可以转换为" hello hi"
  2. def translate(match): - 定义一个将处理每场比赛的回调函数
  3. rc.sub(translate, text) - 执行字符串替换。如果正则表达式与文本匹配,则通过回调在wordDic中查找匹配(实际上是wordDic的键)并返回翻译。
  4. 示例:

    wordDic = {'hello':'hi', 'goodbye': 'bye'}
    text = 'hello my friend, I just wanted to say goodbye'
    translated = multiwordReplace(text, wordDic)
    print(translated)
    

    输出是:

    hi my friend, I just wanted to say bye
    

    修改

    使用re.compile()的主要优点是,如果多次使用正则表达式,性能会提高。由于正则表达式是在每个函数调用上编译的,因此没有增益。如果多次使用wordDic,则为multiwordReplace生成wordDic函数,编译只执行一次:

    import re
    def generateMwR(wordDic):
        rc = re.compile('|'.join(map(re.escape, wordDic)))
        def f(text):
            def translate(match):
                print(match.group(0))
                return wordDic[match.group(0)]
            return rc.sub(translate, text)
        return f
    

    用法如下:

    wordDic = {'hello': 'hi', 'goodbye': 'bye'}
    text = 'hello my friend, I just wanted to say goodbye'
    f = generateMwR(wordDic)
    translated = f(text)
    

答案 1 :(得分:1)

一块一块地......

 # Our dictionary
 wordDic = {'hello': 'foo', 'hi': 'bar', 'hey': 'baz'}

 # Escape every key in dictionary with regular expressions' escape character. 
 # Escaping is requred so that possible special characters in 
 # dictionary words won't mess up the regex
 map(re.escape, wordDic)

 # join all escaped key elements with pipe | to make a string 'hello|hi|hey'
'|'.join(map(re.escape, wordDic))

 # Make a regular expressions instance with given string.
 # the pipe in the string will be interpreted as "OR", 
 # so our regex will now try to find "hello" or "hi" or "hey"
 rc = re.compile('|'.join(map(re.escape, wordDic)))

所以 rc 现在与字典中的单词匹配, rc.sub 替换给定字符串中的单词。翻译功能只返回键的相应值正则表达式返回一个匹配。