从网址提取

时间:2018-06-28 17:43:04

标签: python

如何从URL中提取da_dk部分?我正在尝试从网址中找到国家代码和语言代码。

import re
url = https://www.url.com/content/test/abcd/da_dk/1234.html
#cc_lc = re.search(?, url)
cc ,lc = cc_lc.split(‘_’)
print(cc,lc)

1 个答案:

答案 0 :(得分:0)

您可以做类似的事情

import re
url = "https://www.url.com/content/test/abcd/da_dk/1234.html"
url_list = url.split('/')
for el in url_list:
    if "_" in el:
        codes = el.split("_")
        if (len(codes) == 2):
            #use regex to check coade[0] and code[1] are valid cc and lc or not