Question

假设我有一个像[cat,hat,mat,ate]这样的单词列表，如果字母{{a，我想删除acatbatmate到catbtmate字符串中的所有字母a 1}}不在单词列表中。

在当前步骤中，我可以使用以下代码将单词列表中的单词拆分：

''.join([word.replace('a','') 
         if word not in ['cat','hat','mat','ate'] 
         else word for word in re.split('(cat|hat|mat|ate)','acatbatmate') ])

我是否可以直接使用re.sub(pattern, repl, string)删除字母a？

Answer 1

您可以使用re轻松完成此操作：

import re
except_contexts = ['cat','hat','mat','ate']
print(re.sub(r'({})|a'.format("|".join(except_contexts)), lambda x: x.group(1) if x.group(1) else '', 'acatbatmate'))
# => catbtmate

请参阅Python 2 demo。

如果您使用的是Python 3.5+，仅使用反向引用就更容易了：

import re
except_contexts = ['cat','hat','mat','ate']
print(re.sub(r'({})|a'.format("|".join(except_contexts)), r'\1', 'acatbatmate'))

但是，如果您计划替换 a，则需要使用lambda表达式。

<强>详情

r'({})|a'.format("|".join(except_contexts))看起来像(cat|hat|mat|ate)|a正则表达式。它会将cat，hat等匹配并捕获到第1组中，如果匹配，我们需要替换此组内容。否则，我们要么用空字符串替换，要么替换所需的替换。

请参阅regex demo。

Answer 2

是的，你可以（我一直想这样写......）：

import regex as re

exceptions = ['cat','hat','mat','ate']
rx = re.compile(r'''(?:{})(*SKIP)(FAIL)|a+'''.format('|'.join(exceptions)))

word = rx.sub('', 'acatbatmate')
print(word)

这会使用支持regex的较新(*SKIP)(*FAIL)模块这里的模式是：

(?:cat|hat|mat|ate)(*SKIP)(*FAIL)
|
a+

<小时/> 如果没有新模块，您可以使用函数handler：

import re

exceptions = ['cat','hat','mat','ate']

def handler(match):
    if match.group(1):
        return ''
    return match.group(0)

rx = re.compile(r'''(?:{})|(a+)'''.format('|'.join(exceptions)))

word = rx.sub(handler, 'acatbatmate')
print(word)

Python正则表达式：如果字母不是列表中单词的一部分，则替换该字母

2 个答案: