给定完全由字符串元组列表表示的线性顺序,将顺序输出为字符串列表

时间:2016-12-12 01:00:15

标签: python

给定[(a,b),...]形式的项目对,其中(a,b)表示a > b,例如:

[('best','better'),('best','good'),('better','good')]

我想输出一份表格清单:

['best','better','good']

由于某种原因这很难。有什么想法吗?

======================== code ====================== =======

我知道它为什么不起作用。

def to_rank(raw):

  rank = []

  for u,v in raw:
    if u in rank and v in rank:
      pass

    elif u not in rank and v not in rank:
      rank = insert_front (u,v,rank)
      rank = insert_behind(v,u,rank)

    elif u in rank and v not in rank:
      rank = insert_behind(v,u,rank)

    elif u not in rank and v in rank:
      rank = insert_front(u,v,rank)

  return [[r] for r in rank]

# @Use: insert word u infront of word v in list of words
def insert_front(u,v,words):
  if words == []: return [u]
  else:
    head = words[0]
    tail = words[1:]
    if head == v: return [u] + words
    else        : return ([head] + insert_front(u,v,tail))

# @Use: insert word u behind word v in list of words
def insert_behind(u,v,words):
  words.reverse()
  words = insert_front(u,v,words)
  words.reverse()
  return words

===================更新===================

根据许多人的建议,这是一个直接的拓扑排序设置,我最终决定使用此源代码:algocoding.wordpress.com/2015/04/05/topological-sorting-python /

解决了我的问题。

def go_topsort(graph):
in_degree = { u : 0 for u in graph }     # determine in-degree 
for u in graph:                          # of each node
    for v in graph[u]:
        in_degree[v] += 1

Q = deque()                 # collect nodes with zero in-degree
for u in in_degree:
    if in_degree[u] == 0:
        Q.appendleft(u)

L = []     # list for order of nodes

while Q:                
    u = Q.pop()          # choose node of zero in-degree
    L.append(u)          # and 'remove' it from graph
    for v in graph[u]:
        in_degree[v] -= 1
        if in_degree[v] == 0:
            Q.appendleft(v)

if len(L) == len(graph):
    return L
else:                    # if there is a cycle,  
    return []      

RockBilly的解决方案也适用于我的情况,因为在我的环境中,每个v< ü,我们保证在列表中有一对(u,v)。所以他的答案并不是很好的计算机技术,但在这种情况下,它可以完成工作。

7 个答案:

答案 0 :(得分:2)

如果您指定了完整的语法,那么您可以简单地计算项目:

>>> import itertools as it
>>> from collections import Counter
>>> ranks = [('best','better'),('best','good'),('better','good')]
>>> c = Counter(x for x, y in ranks)
>>> sorted(set(it.chain(*ranks)), key=c.__getitem__, reverse=True)
['best', 'better', 'good']

如果您的语法不完整,那么您可以构建图表并dfs找到最长的所有路径。这不是非常低效,因为我还没想过:):

def dfs(graph, start, end):
    stack = [[start]]
    while stack:
        path = stack.pop()
        if path[-1] == end:
            yield path
            continue
        for next_state in graph.get(path[-1], []):
            if next_state in path:
                continue
            stack.append(path+[next_state])

def paths(ranks):
    graph = {}
    for n, m in ranks:
        graph.setdefault(n,[]).append(m)
    for start, end in it.product(set(it.chain(*ranks)), repeat=2):
        yield from dfs(graph, start, end)

>>> ranks = [('black', 'dark'), ('black', 'dim'), ('black', 'gloomy'), ('dark', 'gloomy'), ('dim', 'dark'), ('dim', 'gloomy')]
>>> max(paths(ranks), key=len)
['black', 'dim', 'dark', 'gloomy']
>>> ranks = [('a','c'), ('b','a'),('b','c'), ('d','a'), ('d','b'), ('d','c')]
>>> max(paths(ranks), key=len)
['d', 'b', 'a', 'c']

答案 1 :(得分:1)

您正在寻找的是topological sort。您可以使用深度优先搜索(我链接的wiki中包含的伪代码)在线性时间内执行此操作

答案 2 :(得分:1)

您可以利用列表中排名最低的项目永远不会出现在任何元组开头的事实。您可以提取此最低项目,然后从列表中删除包含此最低项目的所有元素,并重复以获得下一个最低项目。

即使你有多余的元素,或者这里的一些例子有一个更稀疏的列表,这应该可以工作。我已经将其分解为找到排名最低的项目,然后是使用它来创建最终排名的笨拙工作。

from copy import copy

def find_lowest_item(s):
    #Iterate over set of all items
    for item in set([item for sublist in s for item in sublist]):
        #If an item does not appear at the start of any tuple, return it
        if item not in [x[0] for x in s]:
            return item

def sort_by_comparison(s):
    final_list = []
    #Make a copy so we don't mutate original list
    new_s = copy(s)
    #Get the set of all items
    item_set = set([item for sublist in s for item in sublist])
    for i in range(len(item_set)):
        lowest = find_lowest_item(new_s)
        if lowest is not None:
            final_list.insert(0, lowest)
        #For the highest ranked item, we just compare our current 
        #ranked list with the full set of items
        else:
            final_list.insert(0,set(item_set).difference(set(final_list)).pop())
        #Update list of ranking tuples to remove processed items
        new_s = [x for x in new_s if lowest not in x]
    return final_list

list_to_compare = [('black', 'dark'), ('black', 'dim'), ('black', 'gloomy'), ('dark', 'gloomy'), ('dim', 'dark'), ('dim', 'gloomy')]
sort_by_comparison(list_to_compare)
  

['black','dim','dark','gloomy']

list2 = [('best','better'),('best','good'),('better','good')]
sort_by_comparison(list2)
  

['最好','更好','好']

list3 = [('best','better'),('better','good')]
sort_by_comparison(list3)
  

['最好','更好','好']

答案 3 :(得分:1)

这是一种方法。它基于使用完整的成对排名来创建旧式(早期Python 2)cmp函数,然后使用functools.cmp_to_key将其转换为适合Python 3方法的key分类:

import functools

def sortByRankings(rankings):
    def cmp(x,y):
        if x == y:
            return 0
        elif (x,y) in rankings:
            return -1
        else:
            return 1

    items = list({x for y in rankings for x in y})
    items.sort(key = functools.cmp_to_key(cmp))
    return items

测试如下:

ranks = [('a','c'), ('b','a'),('b','c'), ('d','a'), ('d','b'), ('d','c')]
print(sortByRankings(ranks)) #prints ['d', 'b', 'a', 'c']

请注意,要正常工作,参数rankings必须包含每对不同项的条目。如果没有,您首先需要计算在将它提供给此函数之前已经拥有的对的传递闭包。

答案 4 :(得分:0)

我发现这是一个比较好的解决方案。移动到非重复列表,然后使用python内置排序进行排序。

def order(a):
newlist=[]
for listp in range(len(a)):
    for subp in range(len(a[listp])):
        if a[listp][subp] not in newlist:
            newlist.append(a[listp][subp])
newlist.sort()
return newlist 
s =  [('best', 'better'), ('best', 'good'), ('better', 'good')]
print(order(s))

答案 5 :(得分:-1)

如果您从列表项中排序或创建字典,您将错过@Rockybilly在其答案中提到的订单。我建议你从原始列表的元组创建一个列表,然后删除重复项。

def remove_duplicates(seq):
    seen = set()
    seen_add = seen.add
    return [x for x in seq if not (x in seen or seen_add(x))]

i = [(5,2),(1,3),(1,4),(2,3),(2,4),(3,4)]
i = remove_duplicates(list(x for s in i for x in s))
print(i)  # prints [5, 2, 1, 3, 4]

j = [('excellent','good'),('excellent','great'),('great','good')]
j = remove_duplicates(list(x for s in j for x in s))
print(j)  # prints ['excellent', 'good', 'great']

参见参考:How do you remove duplicates from a list in whilst preserving order?

有关remove_duplicates()功能的说明,请参阅此stackoverflow post

答案 6 :(得分:-3)

如果列表已完成,意味着有足够的信息来进行排名(也没有重复或冗余的输入),这将有效。

from collections import defaultdict
lst = [('best','better'),('best','good'),('better','good')]

d = defaultdict(int)

for tup in lst:
    d[tup[0]] += 1
    d[tup[1]] += 0 # To create it in defaultdict

print sorted(d, key = lambda x: d[x], reverse=True)
# ['best', 'better', 'good']

只要给它们点数,每次在列表中遇到它时都会增加左点。

编辑:我认为OP有一种确定的输入类型。始终具有组合nCr(n,2)的元组计数。这使得这是一个正确的解决方案无需抱怨边缘情况,我已经知道发布答案(并提到它)。