使用正则表达式从单行文本中提取主机名

时间:2016-07-28 21:01:38

标签: python regex

我正在尝试编写一个Python脚本,从其DNS中提取所有Google Cloud Compute子网。有关这方面的更多信息:

https://cloud.google.com/compute/docs/faq#where_can_i_find_short_product_name_ip_ranges

到目前为止,我可以将单个主机名的TXT记录列表作为基本字符串拉出来,没有任何问题。

import dns.resolver

# Set the resolver
my_resolver = dns.resolver.Resolver()
my_resolver.nameservers = ['8.8.8.8']

answer = my_resolver.query('_cloud-netblocks.googleusercontent.com', 'TXT')

for rdata in answer:
    for txt_string in rdata.strings:
        txt_record = txt_string

这给我留下了一串

v=spf1 include:_cloud-netblocks1.googleusercontent.com include:_cloud-netblocks2.googleusercontent.com include:_cloud-netblocks3.googleusercontent.com include:_cloud-netblocks4.googleusercontent.com include:_cloud-netblocks5.googleusercontent.com ?all

我想要做的是使用re.match从这个初始响应中提取5个主机名,这样我就可以连续查找并删除子网,然后将它们放入数组中。我到目前为止所有与正则表达式的努力都没有那么......很棒......我想知道是否有人会提供一些指导?谢谢!

编辑:

以下是需要收集所有Google Cloud IP的其他人的完整脚本。

import dns.resolver, re

# Set the resolver
my_resolver = dns.resolver.Resolver()
my_resolver.nameservers = ['8.8.8.8']

answer = my_resolver.query('_cloud-netblocks.googleusercontent.com', 'TXT')

for rdata in answer:
    for txt_string in rdata.strings:
        txt_record = txt_string

# Extract hostnames into array
hostnames = [x.split(":")[1] for x in txt_record.split() if ":" in x]
total_subnets = []

for host in hostnames:
    answer = my_resolver.query(host, 'TXT')

    for rdata in answer:
        for txt_string in rdata.strings:
            txt_record = txt_string

    ip4_subnets = re.findall(r'ip4:(\S+)', txt_record)
    ip6_subnets = re.findall(r'ip6:(\S+)', txt_record)

    for subnet in ip4_subnets:
        total_subnets.append(subnet)

    for subnet in ip6_subnets:
        total_subnets.append(subnet)

print total_subnets

1 个答案:

答案 0 :(得分:1)

您不需要使用正则表达式,使用split两次并理解:

s = "v=spf1 include:_cloud-netblocks1.googleusercontent.com include:_cloud-netblocks2.googleusercontent.com include:_cloud-netblocks3.googleusercontent.com include:_cloud-netblocks4.googleusercontent.com include:_cloud-netblocks5.googleusercontent.com ?all"
print([x.split(":")[1] for x in s.split() if ":" in x])
# => ['_cloud-netblocks1.googleusercontent.com', 
#     '_cloud-netblocks2.googleusercontent.com',
#     '_cloud-netblocks3.googleusercontent.com',
#     '_cloud-netblocks4.googleusercontent.com',
#     '_cloud-netblocks5.googleusercontent.com']

请参阅demo here

<强>详情:

  • s.split() - 用空格分割
  • if ":" in x - 仅获取内置:
  • 的条目
  • x.split(":")[1] - 使用:拆分上述条目并获取第二个块

当然,如果您愿意,可以使用正则表达式:

include:(\S+)

demo

这将与include:匹配,并将1个非空白符号捕获到第1组。re.findall将获取列表(re.findall(r'include:(\S+)', s))。