我想将URL格式转换为NDN格式,例如以下示例:
https://stackoverflow.com/questions/ask
/com/stackoverflow/questions/ask
使用c ++或python,如何使用正则表达式(regex)实现此目标?
答案 0 :(得分:0)
也许
(?i)^(?:https?:\/\/)(?:w{3}\.)?([^\r\n]+?)\.([a-z0-9]{2,6}(?:\.[a-z0-9]{2,6})?)\/?(.*)$
并替换为
/\2/\1/\3
可以查看。
import re
string = '''
https://stackoverflow.com/questions/ask
https://stackoverflow.co.uk/questions/ask
https://www.stackoverflow.com/questions/ask
http://www.stackoverflow.co.uk/questions/ask
http://www.stackoverflow.co.uk/
http://www.stackoverflow.co.uk
'''
expression = r'(?im)^(?:https?:\/\/)(?:w{3}\.)?([^\r\n]+?)\.([a-z0-9]{2,6}(?:\.[a-z0-9]{2,6})?)\/?(.*)$'
print(re.sub(expression, r'/\2/\1/\3', string))
/com/stackoverflow/questions/ask
/co.uk/stackoverflow/questions/ask
/com/stackoverflow/questions/ask
/co.uk/stackoverflow/questions/ask
/co.uk/stackoverflow/
/co.uk/stackoverflow/
如果您希望简化/修改/探索表达式,请在regex101.com的右上角进行说明。如果愿意,您还可以在this link中查看它如何与某些示例输入匹配。
jex.im可视化正则表达式:
在这里,我们将创建一个可选的捕获组2,以查看是否有.co.uk
个实例:
import re
string = '''
https://stackoverflow.com/questions/ask
https://stackoverflow.co.uk/questions/ask
https://www.stackoverflow.com/questions/ask
http://www.stackoverflow.co.uk/questions/ask
http://www.stackoverflow.co.uk/
http://www.stackoverflow.co.uk
'''
expression = r'(?im)^(?:https?:\/\/)(?:w{3}\.)?([^\r\n]+?)\.([a-z0-9]{2,6})\.?([a-z0-9]{2,6})?\/?(.*)$'
output = []
for match in re.findall(expression, string):
if match[2] != '':
NDN = '/' + match[2] + '/' + match[1] + '/' + match[0] + '/' + match[3]
else:
NDN = '/' + match[1] + '/' + match[0] + '/' + match[3]
if NDN[-1] != '/':
NDN = NDN + '/'
output.append(NDN)
print(output)
['/ com / stackoverflow / questions / ask /', '/ uk / co / stackoverflow / questions / ask /', '/ com / stackoverflow / questions / ask /', '/ uk / co / stackoverflow / questions / ask /','/ uk / co / stackoverflow /', '/ uk / co / stackoverflow /']
如果有一个或多个子域,例如:
http://www.subdomain1.subdomain2.stackoverflow.co.uk
http://www.subdomain1.subdomain2.subdomain3.stackoverflow.co.uk
然后,我们只需为表达式的每个子域添加一个\.?([a-z0-9]+)?
:
(?im)^(?:https?:\/\/)(?:w{3}\.)?([^\r\n]+?)\.([a-z0-9]+)\.?([a-z0-9]+)?\.?([a-z0-9]+)?\.?([a-z0-9]+)?\.?([a-z0-9]+)?\/?(.*)$