Python:对依赖项列表进行排序

时间:2012-07-19 08:56:01

标签: python sorting topological-sort

如果我的问题可以使用内置的sorted()函数解决,或者如果我需要自己做的话,我试图解决问题 - 使用cmp的旧学校会相对容易。

我的数据集如下:

x = [
('business', Set('fleet','address'))
('device', Set('business','model','status','pack'))
('txn', Set('device','business','operator'))
....

排序规则基本上应该是N&的所有价值。 Y,其中Y> N,x [N] [0]不在x [Y] [1]

虽然我使用的是Python 2.6,但cmp参数仍然可用,我试图让Python 3安全。

那么,这可以使用一些lambda魔法和关键参数来完成吗?

- ==更新== -

感谢Eli&温斯顿!我并不认为使用钥匙会起作用,或者如果我怀疑它是一个非常理想的鞋拔解决方案。

因为我的问题是数据库表依赖项,所以我不得不对Eli的代码进行一些小的补充,以从依赖项列表中删除一个项目(在一个设计良好的数据库中,这不会发生,但是谁生活在那个神奇完美的世界里?)

我的解决方案:

def topological_sort(source):
    """perform topo sort on elements.

    :arg source: list of ``(name, set(names of dependancies))`` pairs
    :returns: list of names, with dependancies listed first
    """
    pending = [(name, set(deps)) for name, deps in source]        
    emitted = []
    while pending:
        next_pending = []
        next_emitted = []
        for entry in pending:
            name, deps = entry
            deps.difference_update(set((name,)), emitted) # <-- pop self from dep, req Py2.6
            if deps:
                next_pending.append(entry)
            else:
                yield name
                emitted.append(name) # <-- not required, but preserves original order
                next_emitted.append(name)
        if not next_emitted:
            raise ValueError("cyclic dependancy detected: %s %r" % (name, (next_pending,)))
        pending = next_pending
        emitted = next_emitted

4 个答案:

答案 0 :(得分:16)

您想要的是topological sort。虽然可以使用内置sort()来实现,但它相当笨拙,最好直接在python中实现拓扑排序。

为什么会变得尴尬?如果您在维基页面上研究这两种算法,它们都依赖于一组运行的“标记节点”,这个概念难以扭曲成sort()形式可以使用的概念,因为key=xxx(甚至是cmp=xxx)最适用于无状态比较函数,特别是因为timsort不保证元素将被检查的顺序。我(非常)确定使用{{}的任何解决方案1}}最终会为每次调用key / cmp函数冗余计算一些信息,以解决无状态问题。

以下是我一直在使用的alg(用于排序一些javascript库依赖项):

编辑:根据Winston Ewert的解决方案重做工作

sort()

旁注: 可以将def topological_sort(source): """perform topo sort on elements. :arg source: list of ``(name, [list of dependancies])`` pairs :returns: list of names, with dependancies listed first """ pending = [(name, set(deps)) for name, deps in source] # copy deps so we can modify set in-place emitted = [] while pending: next_pending = [] next_emitted = [] for entry in pending: name, deps = entry deps.difference_update(emitted) # remove deps we emitted last pass if deps: # still has deps? recheck during next pass next_pending.append(entry) else: # no more deps? time to emit yield name emitted.append(name) # <-- not required, but helps preserve original ordering next_emitted.append(name) # remember what we emitted for difference_update() in next pass if not next_emitted: # all entries have unmet deps, one of two things is wrong... raise ValueError("cyclic or missing dependancy detected: %r" % (next_pending,)) pending = next_pending emitted = next_emitted 函数标记为cmp(),如此python错误跟踪器message中所述。

答案 1 :(得分:6)

我做了类似的拓扑排序:

def topological_sort(items):
    provided = set()
    while items:
         remaining_items = []
         emitted = False

         for item, dependencies in items:
             if dependencies.issubset(provided):
                   yield item
                   provided.add(item)
                   emitted = True
             else:
                   remaining_items.append( (item, dependencies) )

         if not emitted:
             raise TopologicalSortFailure()

         items = remaining_items

我认为它比Eli的版本更直接,我不知道效率。

答案 2 :(得分:5)

查看错误的格式和这个奇怪的Set类型...(我将它们保存为元组并正确分隔列表项...)...并使用networkx库来让事情变得方便......

x = [
    ('business', ('fleet','address')),
    ('device', ('business','model','status','pack')),
    ('txn', ('device','business','operator'))
]

import networkx as nx

g = nx.DiGraph()
for key, vals in x:
    for val in vals:
        g.add_edge(key, val)

print nx.topological_sort(g)

答案 3 :(得分:0)

这是温斯顿的建议,通过文档字符串和微小的调整,将dependencies.issubset(provided)provided.issuperset(dependencies)相反。该更改允许您将每个输入对中的dependencies作为任意迭代传递,而不是set

我的用例涉及dict,其键是项字符串,每个键的值是该键所依赖的项名称的list。一旦我确定dict非空,我就可以将其iteritems()传递给修改后的算法。

再次感谢温斯顿。

def topological_sort(items):
    """
    'items' is an iterable of (item, dependencies) pairs, where 'dependencies'
    is an iterable of the same type as 'items'.

    If 'items' is a generator rather than a data structure, it should not be
    empty. Passing an empty generator for 'items' (zero yields before return)
    will cause topological_sort() to raise TopologicalSortFailure.

    An empty iterable (e.g. list, tuple, set, ...) produces no items but
    raises no exception.
    """
    provided = set()
    while items:
         remaining_items = []
         emitted = False

         for item, dependencies in items:
             if provided.issuperset(dependencies):
                   yield item
                   provided.add(item)
                   emitted = True
             else:
                   remaining_items.append( (item, dependencies) )

         if not emitted:
             raise TopologicalSortFailure()

         items = remaining_items