Question

我有一个dicts列表，用于指定流量（使用各自的卷来跳转到destion的源）。现在我想将这些流分成链接（例如（源到跳与卷，跳到目标与卷）并通过汇总其卷来合并所有重复链接。

因为我是蟒蛇新手，所以我想知道什么是好方法。我的第一种方法是遍历所有流并在内部的所有链接中嵌套循环并检查链接是否已存在。

但是如果我有数百万的流量，那我可能会变得非常无能和慢。

我的起始数据如下：

flows = [
    {
        'source': 1,
        'hop': 2,
        'destination': 3,
        'volume': 100,
    },{
        'source': 1,
        'hop': 2,
        'destination': 4,
        'volume': 50,
    },{
        'source': 2,
        'hop': 2,
        'destination': 4,
        'volume': 200,
    },
]

我的结果应该是什么：

links = [
    {
        'source': 1,
        'hop': 2,
        'volume': 150,
    },{
        'hop': 2,
        'destination': 3,
        'volume': 100,
    },{
        'hop': 2,
        'destination': 4,
        'volume': 250,
    },{
        'source': 2,
        'hop': 2,
        'volume': 200,
    },
]

非常感谢你的帮助！

Answer 1

您可以收集指向两个不同词典的链接，一个位于源和词之间。跳跃和跳跃之间的另一个目的地。然后，您可以轻松地从两个dicts创建结果列表。使用Counter以下，dict类似对象，默认值为0：

import pprint
from collections import Counter

flows = [
    {
        'source': 1,
        'hop': 2,
        'destination': 3,
        'volume': 100.5,
    },{
        'source': 1,
        'hop': 2,
        'destination': 4,
        'volume': 50,
    },{
        'source': 2,
        'hop': 2,
        'destination': 4,
        'volume': 200.7,
    },
]

sources = Counter()
hops = Counter()

for f in flows:
    sources[f['source'], f['hop']] += f['volume']
    hops[f['hop'], f['destination']] += f['volume']

res = [{'source': source, 'hop': hop, 'volume': vol} for (source, hop), vol in sources.items()]
res.extend([{'hop': hop, 'destination': dest, 'volume': vol} for (hop, dest), vol in hops.items()])
pprint.pprint(res)

输出：

[{'hop': 2, 'source': 1, 'volume': 150.5},
 {'hop': 2, 'source': 2, 'volume': 200.7},
 {'destination': 3, 'hop': 2, 'volume': 100.5},
 {'destination': 4, 'hop': 2, 'volume': 250.7}]

以上将在 O（n）时间运行，因此如果您有足够的内存，它应该可以使用数百万个流。

Answer 2

伪算法：

创建一个空结果列表/ set / dictionary
循环过流列表
将每个流程拆分为2个链接
对于这两个链接中的每一个，测试它们是否已经在结果列表中（基于2个节点）。
如果没有：添加它们。如果是：升级列表中已有的卷。

Python根据匹配的键/值对减少Dicts列表

2 个答案: