所以,这是我的问题:
我正在尝试替换不是“A”或“C”不敏感的字符串中的任何内容。我的字符串都是三个字符。 (实际上,特定的两个字母会改变,这就是为什么我不会硬编码否定的值。)
所以,我以为我会做
re.sub(r'[ac]', "X", "ABC", re.IGNORECASE)
但是,我得到的是'XXC'。我期待'AXC'。
我的全部数据都是
map(lambda s: re.sub(r'[^ac]', "X", s, re.IGNORECASE), [ "ABC", "ABc", "AbC", "Abc", "aBC", "aBc", "abC", "abc" ])
我得到的是这个:
['XXC', 'XXc', 'XXC', 'XXc', 'aXX', 'aXc', 'aXX', 'aXc']
为什么re.IGNORECASE会替换“A”?而且,为什么有时取代C? (注意它是如何将“abC”变为“aXX”。
如果我这样做:
map(lambda s: re.sub(r'[^acAC]', "X", s), [ "ABC", "ABc", "AbC", "Abc", "aBC", "aBc", "abC", "abc" ])
我得到了我想要的东西:
['AXC', 'AXc', 'AXC', 'AXc', 'aXC', 'aXc', 'aXC', 'aXc']
我必须使用r'[^ acAC]'??是否有办法不加区分地补充正则表达式?
我也很有意思,在vim中,如果我将所有这些字符串放入文本文件中并执行
:%s/[^ac]/X/gi
我做对了。 (并且,如果我在perl中执行此操作,可能会亵渎神灵:
#! /usr/bin/perl
use strict;
foreach my $gene ( "ABC", "ABc", "AbC", "Abc", "aBC", "aBc", "abC", "abc") {
my $replaced = $gene;
$replaced =~ s/[^ac]/X/gi;
printf("%s\n", $replaced);
}
我得到了
AXC
AXc
AXC
AXc
aXC
aXc
aXC
aXc
Ruby也是如此:
irb(main):001:0> ["ABC", "ABc", "AbC", "Abc", "aBC", "aBc", "abC", "abc"].collect{|s| s.gsub(/[^ac]/i,"X") }
=> ["AXC", "AXc", "AXC", "AXc", "aXC", "aXc", "aXC", "aXc"]
如果没有执行r'[^ acAC]',如何在python 中执行等效操作?
谢谢!
答案 0 :(得分:2)
将flags
作为关键字参数传递而不是位置参数:
>>> re.sub(r'[^ac]', "X", "ABC", flags=re.IGNORECASE)
'AXC'
查看source code,
def sub(pattern, repl, string, count=0, flags=0):
"""Return the string obtained by replacing the leftmost
non-overlapping occurrences of the pattern in string by the
replacement repl. repl can be either a string or a callable;
if a string, backslash escapes in it are processed. If it is
a callable, it's passed the match object and must return
a replacement string to be used."""
return _compile(pattern, flags).sub(repl, string, count)
很明显,当您将re.IGNORECASE
作为位置参数传递时,它实际上已传递给count
。可以通过此错误验证:
>>> re.sub(r'[^ac]', "X", "ABC", re.IGNORECASE, count=2)
Traceback (most recent call last):
File "<ipython-input-82-8b949ec4f925>", line 1, in <module>
re.sub(r'[^ac]', "X", "ABC", re.IGNORECASE, count=2)
TypeError: sub() got multiple values for keyword argument 'count'
因此,当re.IGNORECASE
等于2时,输出为'XXC'
(只替换了两个项目)。
>>> re.IGNORECASE
2
>>> re.sub(r'[^ac]', "X", "ABC", re.IGNORECASE)
'XXC'
>>> re.sub(r'[^ac]', "X", "ABC", count=2)
'XXC'
>>> re.sub(r'[^ac]', "X", "ABC", 2)
'XXC'