Python子字符串匹配域排除子域

时间:2015-02-14 00:02:52

标签: python subdomain match

我有一个属于多个域和子域的主机列表。我试图将列表转换为dict:list,以便主机按域/子域进行组织。

蟒蛇' in'字符串匹配将匹配所有子域和域。我正在尝试/(?!sub).domain /作为我的正则表达式,但似乎并没有正确匹配。

尝试根据List2

将List1翻译为Dict
# A list of every host
host_list = [
  'host1.domain.com',
  'host2.domain.com',
  'host20.sub.domain.com',
  'host31.sub.domain.com',
  'host1.example.com',
  'host1.sub.example.com'
]

# A list of all domains we want to organize in the dictionary
domain_list = [
  'two.sub.domain',
  'sub.example',
  'sub.domain',
  'domain',
  'example'
]

期望的结果

domain_dict = {
  'domain': ['host1.domain.com', 'host2.domain.com'],
  'sub.domain': ['host20.sub.domain.com', 'host31.sub.domain.com'],
  'example': ['host1.sub.example.com'],
  'sub.example': ['host1.sub.example.com']
}

我们仍有一个域列表并支持多个子域的解决方案。

关于这一点的一个警告是,域名列表需要从最深(最具体)的子域开始。在之前,请参阅domain_list订单 sub.domain

# We want to protect the original host list
host_list_copy = list(host_list)

for domain in domain_list:
    # Get only the hosts that are part of the same subdomain/domain
    temp_host_list = [x for x in host_list_copy if (domain in x)]
    # Add the list to the dictionary
    domain_dict[domain] = temp_host_list
    # Remove the temp_host_list records from the original host_list_copy 
    host_list_copy[:] = [x for x in host_list_copy if x not in temp_host_list]

2 个答案:

答案 0 :(得分:1)

使用条件:

list1 = [
  'host1.domain.com',
  'host2.domain.com',
  'host20.sub.domain.com',
  'host31.sub.domain.com',
  'host1.example.com',
  'host1.sub.example.com'
]

list2 = [
    'domain',
    'example'
]

list3 = [
  'sub.domain',
  'sub.example'
]

my_dict = {i:[] for i in list2 + list3}

for i in list1:
    for j in zip(list2, list3):
        if j[1] in i:
            my_dict[j[1]].append(i)

        elif j[0] in i:
            my_dict[j[0]].append(i)

答案 1 :(得分:1)

以下是我将如何做到这一点(经过十亿次编辑后):

hosts = [
  'host1.domain.com',
  'host2.domain.com',
  'host20.sub.domain.com',
  'host31.sub.domain.com',
  'host1.example.com',
  'host1.sub.example.com'
]

domains = [
  'domain',
  'sub.domain',
  'example',
  'sub.example'
]

import re
import pprint

dot = r'.'
anything_but_dot = r'[^.]*'
prefix = anything_but_dot + dot

answer = {}
for domain in domains:
    compiled = re.compile(prefix + domain)
    answer[domain] = []
    for host in hosts:
        if compiled.match(host):
            answer[domain].append(host)

pprint.pprint(answer)

这会得到结果:

{'domain': ['host1.domain.com', 'host2.domain.com'],
 'example': ['host1.example.com'],
 'sub.domain': ['host20.sub.domain.com', 'host31.sub.domain.com'],
 'sub.example': ['host1.sub.example.com']}