Question

我试图在Python中使用re.sub替换字符串中最后一个子字符串，但仍然使用正则表达式模式。有人可以帮我找到正确的模式吗？

String = "cr US TRUMP DE NIRO 20161008cr_x080b.wmv"

或

String = "crcrUS TRUMP DE NIRO 20161008cr.xml"

我想替换最后一次出现的＆＃34; cr＆＃34;和扩展前的任何事情。

所需的输出字符串是 -

"cr US TRUMP DE NIRO 20161008.wmv"
"crcrUS TRUMP DE NIRO 20161008.xml"

我使用re.sub替换它。

re.sub('pattern', '', String)

请告知。

Answer 1

使用贪婪量词和捕获组：

re.sub(r'(.*)cr[^.]*', '\\1', input)

Answer 2

使用str.rfind(sub[, start[, end]])函数的替代解决方案：

string = "cr US TRUMP DE NIRO 20161008cr_x080b.wmv"
last_position = string.rfind('cr')
string = string[:last_position] + string[string.rfind('.'):]

print(string)  #cr US TRUMP DE NIRO 20161008.wmv

此外，rfind在这种情况下会更快：
这是测量结果：
使用str.rfind(...)： 0.0054836273193359375
使用re.sub(...)： 0.4017353057861328

Answer 3

您可以使用此负前瞻性正则表达式：

repl = re.sub(r"cr((?!cr)[^.])*(?=\.[^.]+$)", "", input);

RegEx Demo

RegEx分手：

cr         # match cr
(?:        # non-capturing group start
   (?!     # negative lookahead start
      cr   # match cr
   )       # negative lookahead end
   [^.]    # match anything but DOT
)          # non-capturing group end
*          # match 0 or more of matching character that doesn't have cr at next postion
(?=        # positive lookahead start
   \.      # match DOT
   [^.]+   # followed by 1 or more anything but DOT
   $       # end of input
)          # postive lookahead end

re.sub（） - 用于替换字符串中子字符串的最后一次出现的正则表达式

3 个答案: