Question

给定\url{www.mywebsite.com/home/us/index.html}'之类的字符串，我想将网址的部分替换为使用www.example.com/的倒数第二个正斜杠，以便它变为：

\url{www.example.com/us/index.html}`

我假设URL中至少存在一个正斜杠。现在这就是我的尝试。

>>> pattern = r'(\url{).*([^/]*/[^/]*})'
>>> prefix = r'\1www.example.com/\2'
>>> re.sub(pattern, prefix, '\url{www.mywebsite.com/home/us/index.html}')
'\\url{www.example.com//index.html}'

我不确定为什么us部分未包含在结果中，即使我在正则表达式中明确包含[^/]*。

Answer 1

贪婪的.*匹配最后一个斜杠的所有内容。然后，您的论坛只匹配/index.html，第一个[^/]*不匹配（因为*无法匹配）。

在.*之后加上斜杠，强制.*在倒数第二个斜杠之前停止，以防止它消耗您要为该组留下的us捕获：

>>> pattern = r'(\url{).*/([^/]*/[^/]*})'
>>> re.sub(pattern, prefix, '\url{www.mywebsite.com/home/us/index.html}')
'\\url{www.example.com/us/index.html}'

Answer 2

还使用lookhead / lookbehind：

import re
# match anything that has a preceding '{' up to the last two slashes:
pattern = r'(?<={).*(?=(?:[^/]*/){2})'
prefix = r'www.example.com'
print re.sub(pattern, prefix, '\url{www.mywebsite.com/home/us/index.html}')

<强>输出

\url{www.example.com/us/index.html}

或根本不使用正则表达式：

l='\url{www.mywebsite.com/home/us/index.html}'.split(r"/")[-2:]
l=['\url{www.example.com', l[0], l[1]]
print "/".join(l)

用分组替换正则表达式

2 个答案: