我正在寻找Python中非常特殊的RegEx(或其他解决方案,性能接近)来替换模式,如下例所示:
...-1AG.,., should be transformed as ...G.,.,
..,-1A,.,., should be transformed as ..,,.,.,
...-2GTC,., should be transformed as ...C,.,
..,-2GT.,., should be transformed as ..,.,.,
...+3TAGT,, should be transformed as ...T,,
..,+3TAG.,. should be transformed as ..,.,.
基本上:
AnySymbol(不仅是点和逗号),后跟一个+/-符号,后跟一个字母数字(1..9),后跟几个字母,其数量取决于之前的数字,最后是AnySymbol (不仅是点和逗号),
应转换为:
AnySymbol(不仅是点和逗号)和AnySymbol(不仅是点和逗号)。
显然解决方案:String = re.sub(r'[\-\+]\d\w+', "", String)
是不对的,如果我们有案例(...-1AG.,., should be transformed as ...G.,.,)
。
到目前为止,我正在循环r'[\-\+]1\w', r'[\-\+]2\w\w', r'[\-\+]3\w\w\w' ... r'[\-\+]9\w\w\w\w\w\w\w\w\w'
,但我希望有更优雅的解决方案。有什么想法吗?
答案 0 :(得分:3)
看一下这个工作演示。
x="""...-1AG.,., should be transformed as ...G.,.,
..,-1A,.,., should be transformed as ..,,.,.,
...-2GTC,., should be transformed as ...C,.,
..,-2GT.,., should be transformed as ..,.,.,
...+3TAGT,, should be transformed as ...T,,
..,+3TAG.,. should be transformed as ..,.,."""
def repl(matchobj):
return matchobj.group(2)[int(matchobj.group(1)):]
print re.sub(r"[+-](\d+)([a-zA-Z]+)",repl,x)
您可以在re.sub
中使用自己的功能进行customized
替换。