import re
def multiwordReplace(text, wordDic):
rc = re.compile('|'.join(map(re.escape, wordDic))))
def translate(match):
return wordDic[match.group(0)]
return rc.sub(translate, text)
此代码是从其他来源复制的,但我不确定如何替换文本段落中的单词,并且不明白为什么在这里使用're'功能
答案 0 :(得分:1)
re.compile()
- 将表达式字符串编译为regex对象。该字符串由worDic
的分段键和分隔符|
组成。给定wordDic
{'hello':'hi', 'goodbye': 'bye'}
字符串将是' hello | hi'可以转换为" hello 或 hi" def translate(match):
- 定义一个将处理每场比赛的回调函数rc.sub(translate, text)
- 执行字符串替换。如果正则表达式与文本匹配,则通过回调在wordDic中查找匹配(实际上是wordDic
的键)并返回翻译。示例:
wordDic = {'hello':'hi', 'goodbye': 'bye'}
text = 'hello my friend, I just wanted to say goodbye'
translated = multiwordReplace(text, wordDic)
print(translated)
输出是:
hi my friend, I just wanted to say bye
修改强>
使用re.compile()
的主要优点是,如果多次使用正则表达式,性能会提高。由于正则表达式是在每个函数调用上编译的,因此没有增益。如果多次使用wordDic
,则为multiwordReplace
生成wordDic
函数,编译只执行一次:
import re
def generateMwR(wordDic):
rc = re.compile('|'.join(map(re.escape, wordDic)))
def f(text):
def translate(match):
print(match.group(0))
return wordDic[match.group(0)]
return rc.sub(translate, text)
return f
用法如下:
wordDic = {'hello': 'hi', 'goodbye': 'bye'}
text = 'hello my friend, I just wanted to say goodbye'
f = generateMwR(wordDic)
translated = f(text)
答案 1 :(得分:1)
一块一块地......
# Our dictionary
wordDic = {'hello': 'foo', 'hi': 'bar', 'hey': 'baz'}
# Escape every key in dictionary with regular expressions' escape character.
# Escaping is requred so that possible special characters in
# dictionary words won't mess up the regex
map(re.escape, wordDic)
# join all escaped key elements with pipe | to make a string 'hello|hi|hey'
'|'.join(map(re.escape, wordDic))
# Make a regular expressions instance with given string.
# the pipe in the string will be interpreted as "OR",
# so our regex will now try to find "hello" or "hi" or "hey"
rc = re.compile('|'.join(map(re.escape, wordDic)))
所以 rc 现在与字典中的单词匹配, rc.sub 替换给定字符串中的单词。翻译功能只返回键的相应值正则表达式返回一个匹配。