当标志作为re.sub中的位置参数传递时,Python正则表达式不起作用

时间:2014-01-14 09:09:09

标签: python regex regex-negation

所以,这是我的问题:

我正在尝试替换不是“A”或“C”不敏感的字符串中的任何内容。我的字符串都是三个字符。 (实际上,特定的两个字母会改变,这就是为什么我不会硬编码否定的值。)

所以,我以为我会做

re.sub(r'[ac]', "X", "ABC", re.IGNORECASE)

但是,我得到的是'XXC'。我期待'AXC'。

我的全部数据都是

map(lambda s: re.sub(r'[^ac]', "X", s, re.IGNORECASE), [ "ABC", "ABc", "AbC", "Abc", "aBC", "aBc", "abC", "abc" ])

我得到的是这个:

['XXC', 'XXc', 'XXC', 'XXc', 'aXX', 'aXc', 'aXX', 'aXc']

为什么re.IGNORECASE会替换“A”?而且,为什么有时取代C? (注意它是如何将“abC”变为“aXX”。

如果我这样做:

map(lambda s: re.sub(r'[^acAC]', "X", s), [ "ABC", "ABc", "AbC", "Abc", "aBC", "aBc", "abC", "abc" ])

我得到了我想要的东西:

['AXC', 'AXc', 'AXC', 'AXc', 'aXC', 'aXc', 'aXC', 'aXc']

我必须使用r'[^ acAC]'??是否有办法不加区分地补充正则表达式?

我也很有意思,在vim中,如果我将所有这些字符串放入文本文件中并执行

:%s/[^ac]/X/gi

我做对了。 (并且,如果我在perl中执行此操作,可能会亵渎神灵:

    #! /usr/bin/perl

    use strict;

    foreach my $gene ( "ABC", "ABc", "AbC", "Abc", "aBC", "aBc", "abC", "abc") {    
            my $replaced = $gene;
            $replaced =~ s/[^ac]/X/gi;
            printf("%s\n", $replaced);
    }

我得到了

AXC
AXc
AXC
AXc
aXC
aXc
aXC
aXc

Ruby也是如此:

irb(main):001:0> ["ABC", "ABc", "AbC", "Abc", "aBC", "aBc", "abC", "abc"].collect{|s| s.gsub(/[^ac]/i,"X") }
=> ["AXC", "AXc", "AXC", "AXc", "aXC", "aXc", "aXC", "aXc"]

如果没有执行r'[^ acAC]',如何在python 中执行等效操作?

谢谢!

1 个答案:

答案 0 :(得分:2)

flags作为关键字参数传递而不是位置参数:

>>> re.sub(r'[^ac]', "X", "ABC", flags=re.IGNORECASE)
'AXC'

查看source code

def sub(pattern, repl, string, count=0, flags=0):
    """Return the string obtained by replacing the leftmost
    non-overlapping occurrences of the pattern in string by the
    replacement repl.  repl can be either a string or a callable;
    if a string, backslash escapes in it are processed.  If it is
    a callable, it's passed the match object and must return
    a replacement string to be used."""
    return _compile(pattern, flags).sub(repl, string, count)

很明显,当您将re.IGNORECASE作为位置参数传递时,它实际上已传递给count。可以通过此错误验证:

>>> re.sub(r'[^ac]', "X", "ABC", re.IGNORECASE, count=2)
Traceback (most recent call last):
  File "<ipython-input-82-8b949ec4f925>", line 1, in <module>
    re.sub(r'[^ac]', "X", "ABC", re.IGNORECASE, count=2)
TypeError: sub() got multiple values for keyword argument 'count'

因此,当re.IGNORECASE等于2时,输出为'XXC'(只替换了两个项目)。

>>> re.IGNORECASE
2
>>> re.sub(r'[^ac]', "X", "ABC", re.IGNORECASE)
'XXC'
>>> re.sub(r'[^ac]', "X", "ABC", count=2)
'XXC'
>>> re.sub(r'[^ac]', "X", "ABC", 2)
'XXC'