Python-itertools groupby,但仅在新列表中包括组。然后过滤列表?

时间:2018-07-19 16:04:08

标签: python itertools

我有两个字典,其中包含以下示例数据:

列表1:

list_1 = [
    {
        "route": "10.10.4.0",
        "mask": "255.255.255.0",
        "next_hop": "172.18.1.5"
    },
    {
        "route": "10.10.5.0",
        "mask": "255.255.255.0",
        "next_hop": "172.18.1.5"
    },
    {
        "route": "10.10.8.0",
        "mask": "255.255.255.0",
        "next_hop": "172.16.66.34"
    },
    {
        "route": "10.10.58.0",
        "mask": "255.255.255.0",
        "next_hop": "172.18.1.5"
    },
    {
        "route": "172.18.12.4",
        "mask": "255.255.255.252",
        "next_hop": "172.18.1.5"
    }
]

列表2

list_2 = [
    {
        "route": "10.10.4.0",
        "site": "Edinburgh"
    },
    {
        "route": "10.10.8.0",
        "site": "Manchester"
    },
    {
        "route": "10.10.5.0",
        "site": "London"
    },
]

我按照下面的顺序使用这些列表项

temp_merged_data = sorted(itertools.chain(list_1, list_2), key=lambda x:x['route'])
route_data = []
for k,v in itertools.groupby(temp_merged_data, key=lambda x:x['route']):
    d = {}
    for dct in v:
        d.update(dct)
    route_data.append(d) 

哪个返回以下内容,但是我不希望其中没有站点的任何路由,我将如何实现?并且当我拥有了dictionaries / json的最终列表时,例如如果我只想知道伦敦的下一跳,该如何有效地过滤呢?

谢谢

[
    {
        "route": "10.10.4.0",
        "mask": "255.255.255.0",
        "next_hop": "172.18.1.5",
        "site": "Edinburgh"
    },
    {
        "route": "10.10.5.0",
        "mask": "255.255.255.0",
        "next_hop": "172.18.1.5",
        "site": "London"
    },
    {
        "route": "10.10.58.0",
        "mask": "255.255.255.0",
        "next_hop": "172.18.1.5"
    },
    {
        "route": "10.10.8.0",
        "mask": "255.255.255.0",
        "next_hop": "172.16.66.34",
        "site": "Manchester"
    },
    {
        "route": "172.18.12.4",
        "mask": "255.255.255.252",
        "next_hop": "172.18.1.5"
    }
]

6 个答案:

答案 0 :(得分:2)

这是熊猫的一种解决方案:

In [18]: df1=pd.DataFrame(list_1)

In [19]: df2=pd.DataFrame(list_2)    

In [22]: df1.merge(df2, on='route', how='left')
Out[22]: 
              mask      next_hop        route        site
0    255.255.255.0    172.18.1.5    10.10.4.0   Edinburgh
1    255.255.255.0    172.18.1.5    10.10.5.0      London
2    255.255.255.0  172.16.66.34    10.10.8.0  Manchester
3    255.255.255.0    172.18.1.5   10.10.58.0         NaN
4  255.255.255.252    172.18.1.5  172.18.12.4         NaN

过滤掉没有站点的路线,例如:

In [29]: merged=df1.merge(df2, on='route', how='left')
In [31]: df=merged[~merged.site.isna()]
Out[31]: 
            mask      next_hop      route        site
0  255.255.255.0    172.18.1.5  10.10.4.0   Edinburgh
1  255.255.255.0    172.18.1.5  10.10.5.0      London
2  255.255.255.0  172.16.66.34  10.10.8.0  Manchester

仅针对爱丁堡进行过滤:

df[df['site']=='Edinburgh']

以您的格式获取它:

[v for k, v in df.T.to_dict().items()]

输出:

[{'mask': '255.255.255.0',
  'next_hop': '172.18.1.5',
  'route': '10.10.4.0',
  'site': 'Edinburgh'},
 {'mask': '255.255.255.0',
  'next_hop': '172.18.1.5',
  'route': '10.10.5.0',
  'site': 'London'},
 {'mask': '255.255.255.0',
  'next_hop': '172.16.66.34',
  'route': '10.10.8.0',
  'site': 'Manchester'}]

答案 1 :(得分:0)

import itertools
temp_merged_data = sorted(itertools.chain(list_1, list_2), key=lambda x:x['route'])
route_data = []
for k,v in itertools.groupby(temp_merged_data, key=lambda x:x['route']):
    d = {}
    for dct in v:
        if "site" in dct.keys():   #Check if site is in keys
            d.update(dct)
    if d:
        route_data.append(d)
print(route_data)

输出:

[{'route': '10.10.4.0', 'site': 'Edinburgh'}, {'route': '10.10.5.0', 'site': 'London'}, {'route': '10.10.8.0', 'site': 'Manchester'}]

答案 2 :(得分:0)

您可以过滤结果:

d = [{'route': '10.10.4.0', 'mask': '255.255.255.0', 'next_hop': '172.18.1.5', 'site': 'Edinburgh'}, {'route': '10.10.5.0', 'mask': '255.255.255.0', 'next_hop': '172.18.1.5', 'site': 'London'}, {'route': '10.10.58.0', 'mask': '255.255.255.0', 'next_hop': '172.18.1.5'}, {'route': '10.10.8.0', 'mask': '255.255.255.0', 'next_hop': '172.16.66.34', 'site': 'Manchester'}, {'route': '172.18.12.4', 'mask': '255.255.255.252', 'next_hop': '172.18.1.5'}]
new_d = [i for i in d if i.get('site')]

输出:

[{'route': '10.10.4.0', 'mask': '255.255.255.0', 'next_hop': '172.18.1.5', 'site': 'Edinburgh'}, {'route': '10.10.5.0', 'mask': '255.255.255.0', 'next_hop': '172.18.1.5', 'site': 'London'}, {'route': '10.10.8.0', 'mask': '255.255.255.0', 'next_hop': '172.16.66.34', 'site': 'Manchester'}]

答案 3 :(得分:0)

使用实际数据分析工具,例如pandas

import pandas as pd

df1 = pd.DataFrame(list_1)
df2 = pd.DataFrame(list_2)

print(df1.merge(df2))
#             mask      next_hop      route        site
# 0  255.255.255.0    172.18.1.5  10.10.4.0   Edinburgh
# 1  255.255.255.0    172.18.1.5  10.10.5.0      London
# 2  255.255.255.0  172.16.66.34  10.10.8.0  Manchester

答案 4 :(得分:0)

>>> from itertools import groupby, chain
>>> temp_merged_data  = sorted(chain(list_1, list_2), key=lambda x:x['route'])
>>> route_data = [dict(chain(*map(dict.items, v))) for k,v in groupby(temp_merged_data, key=lambda x:x['route'])]
>>> route_data = [d for d in route_data if 'site' in d]
>>> pprint (route_data)
[{'mask': '255.255.255.0',
  'next_hop': '172.18.1.5',
  'route': '10.10.4.0',
  'site': 'Edinburgh'},
 {'mask': '255.255.255.0',
  'next_hop': '172.18.1.5',
  'route': '10.10.5.0',
  'site': 'London'},
 {'mask': '255.255.255.0',
  'next_hop': '172.16.66.34',
  'route': '10.10.8.0',
  'site': 'Manchester'}]

现在,如果您将路线数据转换为dict,则可以更轻松地访问每个站点的参数

>>> route_dict = {d['site']:d for d in route_data}
>>> route_dict['London']['next_hop']
'172.18.1.5'

答案 5 :(得分:0)

鉴于这些列表的结构(路由信息和路由站点),我认为不需要合并和分组。

routes_to_sites = {rs['route']: rs['site'] for rs in list_2}
route_data = []
for ri in list_1:
    site = routes_to_sites.get(ri['route'])
    if site is not None:
        route_data.append({**ri, 'site': site})