假设我有一个格式为host:port
的字符串,其中:port
是可选的。如何可靠地提取这两个组件?
主持人可以是以下任何一个:
localhost
,www.google.com
)1.2.3.4
)[aaaa:bbbb::cccc]
)。换句话说,这是互联网上使用的标准格式(例如在URI中:https://tools.ietf.org/html/rfc3986#section-3.2处的完整语法,不包括"用户信息"组件)。
因此,一些可能的输入和所需的输出:
'localhost' -> ('localhost', None)
'my-example.com:1234' -> ('my-example.com', 1234)
'1.2.3.4' -> ('1.2.3.4', None)
'[0abc:1def::1234]' -> ('[0abc:1def::1234]', None)
答案 0 :(得分:1)
这应该在一个正则表达式中处理整个解析
regex = re.compile(r'''
( # first capture group = Addr
\[ # literal open bracket IPv6
[:a-fA-F0-9]+ # one or more of these characters
\] # literal close bracket
| # ALTERNATELY
(?: # IPv4
\d{1,3}\. # one to three digits followed by a period
){3} # ...repeated three times
\d{1,3} # followed by one to three digits
| # ALTERNATELY
[-a-zA-Z0-9.]+ # one or more hostname chars ([-\w\d\.]) Hostname
) # end first capture group
(?:
: # a literal :
( # second capture group = PORT
\d+ # one or more digits
) # end second capture group
)? # ...or not.''', re.X)
然后需要的是将第二组转换为int。
def parse_hostport(hp):
# regex from above should be defined here.
m = regex.match(hp)
addr, port = m.group(1, 2)
try:
return (addr, int(port))
except TypeError:
# port is None
return (addr, None)
答案 1 :(得分:0)
到目前为止,这是我的尝试:
def parse_hostport(hp):
""" parse a host:port pair
"""
# start by special-casing the ipv6 literal case
x = re.match('^(\[[0-9a-fA-F:]+\])(:(\d+))?$', hp)
if x is not None:
return x.group(1, 3)
# otherwise, just split at the (hopefully only) colon
splits = hp.split(':')
if len(splits) == 1:
return splits + [None,]
elif len(splits) == 2:
return splits
raise ValueError("Invalid host:port input '%s'" % hp)
答案 2 :(得分:0)
这是一个terser实现,它依赖于尝试将最后一个组件解析为int:
def parse_hostport(s):
out = s.rsplit(":", 1)
try:
out[1] = int(out[1])
except (IndexError, ValueError):
# couldn't parse the last component as a port, so let's
# assume there isn't a port.
out = (s, None)
return out
答案 3 :(得分:0)
def split_host_port(string):
if not string.rsplit(':', 1)[-1].isdigit():
return (string, None)
string = string.rsplit(':', 1)
host = string[0] # 1st index is always host
port = int(string[1])
return (host, port)
实际上对这是否是你想要的东西感到困惑,但我把它重写了一点,它似乎仍然遵循理想的输出:
>>>> split_host_port("localhost")
('localhost', None)
>>>> split_host_port("example.com:1234")
('example.com', 1234)
>>>> split_host_port("1.2.3.4")
('1.2.3.4', None)
>>>> split_host_port("[0abc:1def::1234]")
('[0abc:1def::1234]', None)
>>>>
在第一行我不太喜欢链式函数调用,例如getattr(getattr(getattr(string, 'rsplit')(':', 1), '__getitem__')(-1), 'isdigit')()
对于扩展版本然后再重复两行,也许我应该把它变成一个变量,这样就不需要所有的调用了。
但是我在这里挑剔,所以请随时打电话给我,嘿。
答案 4 :(得分:0)
这是我的最后一次尝试,并为其他提供灵感的回答者提供了信誉:
def parse_hostport(s, default_port=None):
if s[-1] == ']':
# ipv6 literal (with no port)
return (s, default_port)
out = s.rsplit(":", 1)
if len(out) == 1:
# No port
port = default_port
else:
try:
port = int(out[1])
except ValueError:
raise ValueError("Invalid host:port '%s'" % s)
return (out[0], port)
答案 5 :(得分:0)
好吧,这是Python,附带电池。您已经提到,格式是URI中使用的标准格式,那么urllib.parse
呢?
import urllib.parse
def parse_hostport(hp):
# urlparse() and urlsplit() insists on absolute URLs starting with "//"
result = urllib.parse.urlsplit('//' + hp)
return result.hostname, result.port
这应该处理您可以扔给它的任何有效host:port
。