Question

我是Python的新手，而使用正则表达式则很糟糕。我的要求是在现有代码中修改模式

我已经提取了要修复的代码。

def replacer_factory(spelling_dict):
    def replacer(match):
        word = match.group()
        return spelling_dict.get(word, word)
    return replacer

def main():
    repkeys = {'modify': 'modifyNew', 'extract': 'extractNew'}
    with open('test.xml', 'r') as file :
        filedata = file.read()
    pattern = r'\b\w+\b' # this pattern matches whole words only
    #pattern = r'[\'"]\w+[\'"]'
    #pattern = r'["]\w+["]' 
    #pattern = '\b[\'"]\w+[\'"]\b'
    #pattern = '(["\'])(?:(?=(\\?))\2.)*?\1'

    replacer = replacer_factory(repkeys)
    filedata = re.sub(pattern, replacer, filedata)

if __name__ == '__main__':
    main()

输入

<fn:modify ele="modify">
<fn:extract name='extract' value="Title"/>
</fn:modify>

预期输出。请注意，替换词可以用单引号或双引号引起来。

<fn:modify ele="modifyNew">
<fn:extract name='extractNew' value="Title"/>
</fn:modify>

现有模式r'\b\w+\b'会产生<fn:modifyNew ele="modifyNew">，但是我要寻找的是<fn:modify ele="modifyNew">

我到目前为止尝试过的模式都以注释的形式给出。我后来才意识到它们中的一些是错误的，因为前缀为r的字符串文字是对反斜杠的特殊处理，等等。我仍然包括他们来回顾到目前为止我所做的一切。

如果我能找到解决这个问题的模式，而不是改变逻辑，那就太好了。如果现有代码无法做到这一点，请同时指出。我工作的环境具有Python 2.6

感谢您的帮助。

Answer 1

您需要使用r'''(['"])(\w+)\1'''正则表达式，然后需要采用替代方法：

def replacer_factory(spelling_dict):
    def replacer(match):
        return '{0}{1}{0}'.format(match.group(1), spelling_dict.get(match.group(2), match.group(2)))
    return replacer

您与(['"])(\w+)\1匹配的单词是双引号或单引号，但该值在第2组中，因此使用spelling_dict.get(match.group(2), match.group(2))。另外，必须将引号放回去，因此'{0}{1}{0}'.format()。

请参见Python demo：

import re
def replacer_factory(spelling_dict):
    def replacer(match):
        return '{0}{1}{0}'.format(match.group(1), spelling_dict.get(match.group(2), match.group(2)))
    return replacer

repkeys = {'modify': 'modifyNew', 'extract': 'extractNew'}
pattern = r'''(['"])(\w+)\1'''
replacer = replacer_factory(repkeys)
filedata = """<fn:modify ele="modify">
<fn:extract name='extract' value="Title"/>
</fn:modify>"""
print( re.sub(pattern, replacer, filedata) )

输出：

<fn:modify ele="modifyNew">
<fn:extract name='extractNew' value="Title"/>
</fn:modify>

用Python模式替换单引号或双引号之间的单词

1 个答案: