Question

我正在分析的文本中有很多常用单词，我打算使用正则表达式模糊匹配来替换它们的拼写错误。

我知道我可以像这样遍历它们

import regex as re

edits = 1
my_arr = ['word1', 'word2', 'word3']
my_text = 'this is my text with wrd1 in it'

for word in my_arr:
    r_pattern = '(' + word + ')' + '){e<=' + str(edits) + '}'
    my_text = re.sub(r_pattern, word, my_text)

但是有没有一种方法可以使用regex.sub一行完成此操作？也就是说，我的模式可能看起来像

r_pattern = '(word1|word2|word3){e<=1}'

Answer 1

这是我的解决方法

import regex as re

def repl(matchObj):
    return str(matchObj.lastgroup)

edits = 1
my_arr = ['word1', 'word2', 'word3']
my_text = 'this is my text with wrd3 in it'

r_pattern = ""
for i in range(len(my_arr)):
    if i == len(my_arr)-1:
        r_pattern += '(?P<' + my_arr[i] + '>' + my_arr[i] + '){e<=' + str(edits) + '}'
    else:
        r_pattern += '(?P<' + my_arr[i] + '>' + my_arr[i] + '){e<=' + str(edits) + '}|'

r = re.compile(r_pattern)
my_text = re.sub(r, repl, my_text)
print (my_text)

它使用match对象的lastgroup属性，该属性告诉您导致替换触发的组。如果需要，这可以在较大的数组上很好地扩展，前提是对re.compile的限制不会妨碍您。希望这可以帮助。具有最后一组的Python文档：https://docs.python.org/2/library/re.html 方便的正则表达式编辑器，可以解决以后的问题：https://regex101.com

使用模糊正则表达式（在Python中）纠正拼写

1 个答案: