如何将URL转换为NDN格式?

时间:2019-11-16 16:31:32

标签: regex url

我想将URL格式转换为NDN格式,例如以下示例:

https://stackoverflow.com/questions/ask

/com/stackoverflow/questions/ask 

使用c ++或python,如何使用正则表达式(regex)实现此目标?

1 个答案:

答案 0 :(得分:0)

方法1

也许

(?i)^(?:https?:\/\/)(?:w{3}\.)?([^\r\n]+?)\.([a-z0-9]{2,6}(?:\.[a-z0-9]{2,6})?)\/?(.*)$

并替换为

/\2/\1/\3

可以查看。

RegEx Demo

测试

import re

string = '''
https://stackoverflow.com/questions/ask
https://stackoverflow.co.uk/questions/ask
https://www.stackoverflow.com/questions/ask
http://www.stackoverflow.co.uk/questions/ask
http://www.stackoverflow.co.uk/
http://www.stackoverflow.co.uk
'''

expression = r'(?im)^(?:https?:\/\/)(?:w{3}\.)?([^\r\n]+?)\.([a-z0-9]{2,6}(?:\.[a-z0-9]{2,6})?)\/?(.*)$'

print(re.sub(expression, r'/\2/\1/\3', string))

输出

/com/stackoverflow/questions/ask
/co.uk/stackoverflow/questions/ask
/com/stackoverflow/questions/ask
/co.uk/stackoverflow/questions/ask
/co.uk/stackoverflow/
/co.uk/stackoverflow/

如果您希望简化/修改/探索表达式,请在regex101.com的右上角进行说明。如果愿意,您还可以在this link中查看它如何与某些示例输入匹配。


RegEx电路

jex.im可视化正则表达式:

enter image description here


方法2

在这里,我们将创建一个可选的捕获组2,以查看是否有.co.uk个实例:

RegEx Demo 2

测试

import re

string = '''
https://stackoverflow.com/questions/ask
https://stackoverflow.co.uk/questions/ask
https://www.stackoverflow.com/questions/ask
http://www.stackoverflow.co.uk/questions/ask
http://www.stackoverflow.co.uk/
http://www.stackoverflow.co.uk
'''

expression = r'(?im)^(?:https?:\/\/)(?:w{3}\.)?([^\r\n]+?)\.([a-z0-9]{2,6})\.?([a-z0-9]{2,6})?\/?(.*)$'

output = []
for match in re.findall(expression, string):
    if match[2] != '':
        NDN = '/' + match[2] + '/' + match[1] + '/' + match[0] + '/' + match[3]
    else:
        NDN = '/' + match[1] + '/' + match[0] + '/' + match[3]

    if NDN[-1] != '/':
        NDN = NDN + '/'

    output.append(NDN)

print(output)

输出

  

['/ com / stackoverflow / questions / ask /',   '/ uk / co / stackoverflow / questions / ask /',   '/ com / stackoverflow / questions / ask /',   '/ uk / co / stackoverflow / questions / ask /','/ uk / co / stackoverflow /',   '/ uk / co / stackoverflow /']

方法3

如果有一个或多个子域,例如:

http://www.subdomain1.subdomain2.stackoverflow.co.uk
http://www.subdomain1.subdomain2.subdomain3.stackoverflow.co.uk

然后,我们只需为表达式的每个子域添加一个\.?([a-z0-9]+)?

(?im)^(?:https?:\/\/)(?:w{3}\.)?([^\r\n]+?)\.([a-z0-9]+)\.?([a-z0-9]+)?\.?([a-z0-9]+)?\.?([a-z0-9]+)?\.?([a-z0-9]+)?\/?(.*)$

RegEx Demo 3