如何在Python中解析host:port对

时间:2017-10-22 16:54:47

标签: python

假设我有一个格式为host:port的字符串,其中:port是可选的。如何可靠地提取这两个组件?

主持人可以是以下任何一个:

  • 主机名(localhostwww.google.com
  • IPv4文字(1.2.3.4
  • IPv6文字([aaaa:bbbb::cccc])。

换句话说,这是互联网上使用的标准格式(例如在URI中:https://tools.ietf.org/html/rfc3986#section-3.2处的完整语法,不包括"用户信息"组件)。

因此,一些可能的输入和所需的输出:

'localhost' -> ('localhost', None)
'my-example.com:1234' -> ('my-example.com', 1234)
'1.2.3.4' -> ('1.2.3.4', None)
'[0abc:1def::1234]' -> ('[0abc:1def::1234]', None)

6 个答案:

答案 0 :(得分:1)

这应该在一个正则表达式中处理整个解析

regex = re.compile(r'''
(                            # first capture group = Addr
  \[                         # literal open bracket                       IPv6
    [:a-fA-F0-9]+            # one or more of these characters
  \]                         # literal close bracket
  |                          # ALTERNATELY
  (?:                        #                                            IPv4
    \d{1,3}\.                # one to three digits followed by a period
  ){3}                       # ...repeated three times
  \d{1,3}                    # followed by one to three digits
  |                          # ALTERNATELY
  [-a-zA-Z0-9.]+              # one or more hostname chars ([-\w\d\.])      Hostname
)                            # end first capture group
(?:                          
  :                          # a literal :
  (                          # second capture group = PORT
    \d+                      # one or more digits
  )                          # end second capture group
 )?                          # ...or not.''', re.X)

然后需要的是将第二组转换为int。

def parse_hostport(hp):
    # regex from above should be defined here.
    m = regex.match(hp)
    addr, port = m.group(1, 2)
    try:
        return (addr, int(port))
    except TypeError:
        # port is None
        return (addr, None)

答案 1 :(得分:0)

到目前为止,这是我的尝试:

def parse_hostport(hp):
    """ parse a host:port pair
    """
    # start by special-casing the ipv6 literal case
    x = re.match('^(\[[0-9a-fA-F:]+\])(:(\d+))?$', hp)
    if x is not None:
        return x.group(1, 3)

    # otherwise, just split at the (hopefully only) colon
    splits = hp.split(':')

    if len(splits) == 1:
        return splits + [None,]
    elif len(splits) == 2:
        return splits

    raise ValueError("Invalid host:port input '%s'" % hp)

答案 2 :(得分:0)

这是一个terser实现,它依赖于尝试将最后一个组件解析为int:

def parse_hostport(s):
    out = s.rsplit(":", 1)
    try:
        out[1] = int(out[1])
    except (IndexError, ValueError):
        # couldn't parse the last component as a port, so let's
        # assume there isn't a port.
        out = (s, None)
    return out

答案 3 :(得分:0)

def split_host_port(string):
    if not string.rsplit(':', 1)[-1].isdigit():
        return (string, None)

    string = string.rsplit(':', 1)

    host = string[0]  # 1st index is always host
    port = int(string[1])

    return (host, port)

实际上对这是否是你想要的东西感到困惑,但我把它重写了一点,它似乎仍然遵循理想的输出:

>>>> split_host_port("localhost")
('localhost', None)
>>>> split_host_port("example.com:1234")
('example.com', 1234)
>>>> split_host_port("1.2.3.4")
('1.2.3.4', None)
>>>> split_host_port("[0abc:1def::1234]")
('[0abc:1def::1234]', None)
>>>> 

在第一行我不太喜欢链式函数调用,例如getattr(getattr(getattr(string, 'rsplit')(':', 1), '__getitem__')(-1), 'isdigit')()对于扩展版本然后再重复两行,也许我应该把它变成一个变量,这样就不需要所有的调用了。

但是我在这里挑剔,所以请随时打电话给我,嘿。

答案 4 :(得分:0)

这是我的最后一次尝试,并为其他提供灵感的回答者提供了信誉:

def parse_hostport(s, default_port=None):
    if s[-1] == ']':
        # ipv6 literal (with no port)
        return (s, default_port)

    out = s.rsplit(":", 1)
    if len(out) == 1:
        # No port
        port = default_port
    else:
        try:
            port = int(out[1])
        except ValueError:
            raise ValueError("Invalid host:port '%s'" % s)

    return (out[0], port)

答案 5 :(得分:0)

好吧,这是Python,附带电池。您已经提到,格式是URI中使用的标准格式,那么urllib.parse呢?

import urllib.parse

def parse_hostport(hp):
    # urlparse() and urlsplit() insists on absolute URLs starting with "//"
    result = urllib.parse.urlsplit('//' + hp)
    return result.hostname, result.port

这应该处理您可以扔给它的任何有效host:port