使用正则表达式排除字符串的实际工作原理

时间:2018-12-18 20:02:57

标签: python regex-negation regex-lookarounds

我不清楚否定正则表达式如何工作。我关注了几篇文章(post 1post 2,),并且使用了它们的模式并且它们可以工作,但是它们的解释对我来说没有意义。我尝试了多个正则表达式测试器站点,例如regex101等,但是它们无法按照帖子1和2处理似乎在Python中起作用的模式。

我的首选方法是,正则表达式对负逻辑的行为与对正逻辑的行为相同。但是,在我看来,一旦使用了否定逻辑,它就开始了一种全新的难以理解的处理方式。我知道有解决方法,但是我想通过正则表达式来了解它。

以下示例的目标:假设我有一个商品清单,我希望从中得到一个清单,该清单不是变量except所定义的“天然气”。换句话说,我需要一个名称不包含“ gas”一词的产品列表。

以下是用于尝试不同想法的帮助代码:

import re
cmdty = ['natural gas', 'Henry Hub Natural Gas Contract', 'Crude Oil', 'Brent', 'WTI']
expect= cmdty[-3:] # i.e. ['Crude Oil', 'Brent', 'WTI']
print(f'Starting list: {cmdty}. Would like to get: {expect}')
def check (pattern,cmdty=cmdty, expect=expect, comment=""): 
    out = [c for c in cmdty if re.search(pattern,c)]
    good = "yes" if set(out) == set(expect) else "no"
    print(f'pattern={pattern:20}: worked: {good:>3}. output={out}. comment: {comment}')

对正则表达式进行各种尝试以使其起作用:

check(pattern='(?i)(?=gas)',comment="This one works, but requires negating the results")
check(pattern='(?i)(?!gas)',comment="My hope was that this would work")
check('(?i)(?:!gas)',comment="")
check('(?i)\s(?!gas)',comment="strange outcome")
check('(?i).*(?!gas).*')
check('^(?i)(?!.*gas).*$', comment='works')
check('^(?i)((?!gas).)*$', comment='not sure this one works')
check('(?i)^.*(?!gas).*$',comment="I'd expect this one to work, but does not")
check('(?i)^(?!.*gas).*$', comment='works')
check('(?i)nat(?!gas)', comment='makes sense, but super odd')

初始列表和目标:

Starting list: ['natural gas', 'Henry Hub Natural Gas Contract', 'Crude Oil', 'Brent', 'WTI'].

Would like to get: ['Crude Oil', 'Brent', 'WTI']

这里是使用各种尝试使其工作的输出结果。考虑这个问题的方式是什么,所以很有意义。

pattern=(?i)(?=gas)         : worked:  no. output=['natural gas', 'Henry Hub Natural Gas Contract']. comment: This one works, but requires negating the results
pattern=(?i)(?!gas)         : worked:  no. output=['natural gas', 'Henry Hub Natural Gas Contract', 'Crude Oil', 'Brent', 'WTI']. comment: My hope was that this would work
pattern=(?i)(?:!gas)        : worked:  no. output=[]. comment: 
pattern=(?i)\s(?!gas)       : worked:  no. output=['Henry Hub Natural Gas Contract', 'Crude Oil']. comment: strange outcome
pattern=(?i).*(?!gas).*     : worked:  no. output=['natural gas', 'Henry Hub Natural Gas Contract', 'Crude Oil', 'Brent', 'WTI']. comment: 
pattern=^(?i)(?!.*gas).*$   : worked: yes. output=['Crude Oil', 'Brent', 'WTI']. comment: works
pattern=^(?i)((?!gas).)*$   : worked: yes. output=['Crude Oil', 'Brent', 'WTI']. comment: not sure this one works
pattern=(?i)^.*(?!gas).*$   : worked:  no. output=['natural gas', 'Henry Hub Natural Gas Contract', 'Crude Oil', 'Brent', 'WTI']. comment: I'd expect this one to work, but does not
pattern=(?i)^(?!.*gas).*$   : worked: yes. output=['Crude Oil', 'Brent', 'WTI']. comment: works
pattern=(?i)nat(?!gas)      : worked:  no. output=['natural gas', 'Henry Hub Natural Gas Contract']. comment: makes sense, but super odd`

0 个答案:

没有答案