缓存`us`节点属性

Question

我有一大组From / To对，它们代表连接节点的层次结构。例如，层次结构：

     4 -- 5 -- 8
    / 
   2 --- 6 - 9 -- 10
  /           \ 
 1              -- 11
  \
   3 ----7

封装为：

{(11, 9), (10, 9), (9, 6), (6, 2), (8, 5), (5, 4), (4, 2), (2, 1), (3, 1), (7, 3)}

我希望能够创建一个返回给定节点上游所有节点的函数，例如：

nodes[2].us
> [4, 5, 6, 8, 9, 10, 11]

我的实际节点集合数以万计，所以我希望能够非常快速地返回所有上游节点的列表，而不必在每次我想要获得上游时对整个集合执行递归集。

这是我迄今为止最好的尝试，但它没有达到两级以上。

class Node:
    def __init__(self, fr, to):
        self.fr = fr
        self.to = to
        self.us = set()

def build_hierarchy(nodes):
    for node in nodes.values():
        if node.to in nodes:
            nodes[node.to].us.add(node)
    for node in nodes.values():
        for us_node in node.us.copy():
            node.us |= us_node.us
    return nodes

from_to = {(11, 9), (10, 9), (9, 6), (6, 2), (8, 5), (5, 4), (4, 2), (2, 1), (3, 1), (7, 3), (1, 0)}
nodes = {fr: Node(fr, to) for fr, to in from_to} # node objects indexed by "from"
nodes = build_hierarchy(nodes)

print [node.fr for node in nodes[2].us]
> [4, 6, 5, 9]

Answer 1

这是一个计算单个节点的整个上游列表的函数：

def upstream_nodes(start_node):
    result = []
    current = start_node
    while current.to:  # current.to == 0 means we're at the root node
        result.append(current.to)
        current = nodes[current.to]
    return result

您已经说过，每次查询上游时都不想迭代整个节点集，但这不会：它只会查询节点的父节点及其父节点到了根。因此，如果节点向下四级，它将进行四次字典查找。

或者，如果您想要非常聪明，这里的版本只会使每个父查询一次，然后将该查找存储在Node对象的.us属性中，这样您就不必再次计算该值。（如果在创建图表后节点的父链接不会改变，这将起作用 - 如果您更改图形，当然，它不会。）

def caching_upstream_nodes(start_node, nodes):
    # start_node is the Node object whose upstream set you want
    # nodes is the dictionary you created mapping ints to Node objects
    if start_node.us:
        # We already calculated this once, no need to re-calculate
        return start_node.us
    parent = nodes.get(start_node.to)
    if parent is None:
        # We're at the root node
        start_node.us = set()
        return start_node.us
    # Otherwise, our upstream is our parent's upstream, plus the parent
    parent_upstream = caching_upstream_nodes(parent, nodes)
    start_node.us = parent_upstream.copy()
    start_node.us.add(start_node.to)
    return start_node.us

这两个功能中的一个应该是您正在寻找的。（注意：运行这些时要谨慎一点，因为我刚刚编写它们但没有花时间来测试它们。我相信算法是正确的，但我总是有可能在写它时犯了一个基本错误。）

Answer 2

我将展示两种方法。首先，我们只需修改您的us属性，以智能地计算和缓存后代查找的结果。其次，我们将使用图表库networkx。

如果您的数据自然具有图形结构，我建议您使用图表库。你会以这种方式为自己省去很多麻烦。

缓存`us`节点属性

您可以将us属性设为属性，并缓存先前查找的结果：

class Node(object):

    def __init__(self):
        self.name = None
        self.parent = None
        self.children = set()
        self._upstream = set()

    def __repr__(self):
        return "Node({})".format(self.name)

    @property
    def upstream(self):
        if self._upstream:
            return self._upstream
        else:
            for child in self.children:
                self._upstream.add(child)
                self._upstream |= child.upstream
            return self._upstream

请注意，我使用的表示方式与您略有不同。我将创建图表：

import collections

edges = {(11, 9), (10, 9), (9, 6), (6, 2), (8, 5), (5, 4), (4, 2), (2, 1), (3, 1), (7, 3)}
nodes = collections.defaultdict(lambda: Node())

for node, parent in edges:
    nodes[node].name = node
    nodes[parent].name = parent
    nodes[node].parent = nodes[parent]
    nodes[parent].children.add(nodes[node])

我将查找节点2的上游节点：

>>> nodes[2].upstream
{Node(5), Node(4), Node(11), Node(9), Node(6), Node(8), Node(10)}

一旦计算出2的上游节点，如果调用它们将不会重新计算，例如nodes[1].upstream。如果对图形进行任何更改，则上游节点将不正确。

使用`networkx`

如果我们使用networkx来表示我们的图形，那么查找节点的所有后代非常简单：

>>> import networkx as nx
>>> from_to = [(11, 9), (10, 9), (9, 6), (6, 2), (8, 5), (5, 4), (4, 2), 
               (2, 1), (3, 1), (7, 3), (1, 0)]
>>> graph = nx.DiGraph(from_to).reverse()
>>> nx.descendants(graph, 2)
{4, 5, 6, 8, 9, 10, 11}

这并没有完全回答你的问题，这似乎是关于优化后代的查找，所以后续调用不会重复工作。但是，据我们所知，networkx.descendants可能会进行一些智能缓存。

所以这就是我的建议：避免过早优化并使用库。如果networkx.descendants太慢，那么您可以调查networkx代码以查看它是否缓存查找。如果没有，您可以使用更原始的networkx函数构建自己的缓存查找。我敢打赌，networkx.descendants可以正常工作，你不需要完成额外的工作。

无需迭代即可链接数据

2 个答案:

缓存`us`节点属性

使用`networkx`

无需迭代即可链接数据

2 个答案:

缓存us节点属性

使用networkx

缓存`us`节点属性

使用`networkx`