Question

我有字典：

http://192.168.1.5 and http://192.168.1.18

两个条目具有一些IP地址，但是hostServiceDict = {"http://192.168.1.1:80/thredds/catalog.xml": ['OPENDAP', 'WMS', 'HTTP', 'ISO'], "http://192.168.1.2:80/thredds/catalog.xml": ['OPENDAP', 'WMS', 'HTTP', 'ISO', 'UDDC'], "http://192.168.1.3:80/thredds/catalog.xml": ['OPENDAP', 'WMS', 'HTTP', 'ISO', 'HTTPServer'], "http://192.168.1.4:8080/thredds/catalog.xml": ['OPENDAP', 'WMS', 'HTTP', 'ISO', 'NetcdfSubset'], "http://192.168.1.5:8080/thredds/catalog.xml": ['OPENDAP', 'WMS', 'HTTP', 'ISO', 'WCS', 'NCSS'], "http://192.168.1.6:80/thredds/catalog.xml": ['OPENDAP', 'WMS', 'HTTP', 'ISO', 'DAP4'], "http://192.168.1.7:80/thredds/catalog.xml": ['OPENDAP', 'WMS', 'HTTP', 'ISO', 'NCML', 'DAP4'], "http://192.168.1.8:80/thredds/catalog.xml": ['OPENDAP', 'WMS', 'HTTP', 'ISO', 'NetcdfSubset'], "http://192.168.1.9:80/thredds/catalog.xml": ['OPENDAP', 'WMS', 'HTTP', 'ISO', 'UDDC'], "http://192.168.1.18:80/thredds/catalog.xml": ['OPENDAP', 'WMS', 'HTTP', 'ISO', 'NetcdfSubset'], }的端口部分不同。我需要删除第二个重复的对象，使其像这样：

result = {}
for urls, services in hostServiceDict.items():
    i = urls.strip('http://').strip('thredds/catalog.xml').split(':')
    ip = i[0]
    if ip not in result.items():
        if ip in urls:
            result[urls] = services

print(result)

我已经尝试过了，但是它仍然给我与原点相同的结果：

{{1}}

Answer 1

if ip not in result.items():永远找不到ip，因为IP不在results中。您必须跟踪所见IP：

result = {}
seen_ips = set()
for url, services in hostServiceDict.items():
    ip = url.strip('http://').strip('thredds/catalog.xml').split(':')[0]
    if ip not in seen_ips:
        seen_ips.add(ip)
        result[url] = services

print(result)

为使代码更好，您可以进行真正的URL解析：

import re

def get_host(url):
    return re.match(r'https?://([^:/]+).*', url).groups(0)

然后，更容易制作一个宿主->（URL，服务）字典，而不是“手动”删除重复项：

data_by_hostname = {get_host(url): (url, services)
                    for url, services in hostServiceDict.items()}

此命令负责删除重复的主机名。

然后，如果需要，可以再次根据以下值构造url->服务字典：

result = dict(data_by_hostname.values())

Answer 2

您可以通过具有列表并使用已跟踪的ip验证新ip来跟踪不同的ip，这将需要对逻辑进行一些小的更改，如下所示：

result = {}
distinct_ips = []
for urls, services in hostServiceDict.items():
    i = urls.strip('http://').strip('thredds/catalog.xml').split(':')
    ip = i[0]
    if ip not in distinct_ips:
        distinct_ips.append(ip)
        if ip in urls:
            result[urls] = services

print(result)

删除字典中的类似条目

2 个答案: