Question

我正在尝试根据条件从字符串的开头修剪子字符串：

例如，如果输入是以 http ， https 和/或 www 为前缀的域名，则需要删除这些和只返回域名。

这是我到目前为止所拥有的：

if my_domain.startswith("http://"):
        my_domain = my_domain[7:]
elif my_domain.startswith("https://"):
        my_domain = my_domain[8:]

if my_domain.startswith("www."):
        my_domain = my_domain[4:]

print my_domain

我尝试使用这些内置函数（.startswith），而不是尝试使用正则表达式。

虽然上面的代码有效，但我想知道是否有更有效的方法来组合条件以缩短代码或在同一个条件语句中进行多次检查？

Answer 1

我知道正则表达式在计算上比许多内置方法慢，但编写代码要容易得多：）

import re
re.sub("http[s]*://|www\." , "", my_domain)

编辑：正如@Dunes所提到的，更正确的方法是回答这个问题。

re.sub(r"^https?://(www\.)?" , "" , my_domain)

旧答案留待参考，以便Dunes评论仍然有一些背景。

Answer 2

使用urllib.parse（Python 3）。

>>> from urllib import parse
>>> components = parse.urlsplit('http://stackoverflow.com/questions/38187220/stripping-multiple-characters-from-the-start-of-a-string')
>>> components[1]
'stackoverflow.com'

Python 2.7等效命名为urlparse。

要涵盖'www.'案例，您只需执行

即可

* subdomains, domain, ending = components[1].split('.')
return '.'.join((domain, ending))

在Python 2.7中，您无法访问*解包，但您可以使用列表切片来获得相同的效果。

从字符串的开头剥离多个字符

2 个答案: