Question

[是的，标题不是拼写错误！]

在python中我需要一些解析URL的东西。我无法相信标准尚不存在的东西。由于URL是在配置中设置的，我想确保它不是垃圾。

有urlparse.urlparse，但只解析'有效的URL'（有些无效的URL有时引发无证的ValueError）

e.g。

>>> import urlparse
>>> urlparse.urlparse('http://aa :: aa ! aa:11.com:aa').netloc
'aa :: aa ! aa:11.com:aa'

显示了urlparse如何解析我认为无效的URL。

Answer 1

URL解析和URL验证实际上是不同的任务。

urlparse.urlparse进行解析，验证通常使用正则表达式机器（Python中的内置re模块）。

以下是Django框架的URL验证示例：

regex = re.compile(
    r'^(?:http|ftp)s?://' # http:// or https://
    r'(?:(?:[A-Z0-9](?:[A-Z0-9-]{0,61}[A-Z0-9])?\.)+(?:[A-Z]{2,6}\.?|[A-Z0-9-]{2,}\.?)|' #domain...
    r'localhost|' #localhost...
    r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})' # ...or ip
    r'(?::\d+)?' # optional port
    r'(?:/?|[/?]\S+)$', re.IGNORECASE)

python中的URL解析器无效

1 个答案: