如何根据匹配数比较2个列表和1个订单

时间:2016-08-29 06:59:56

标签: python

假设您有一个列表,例如:

first_list = ['a', 'b', 'c']

你有以下清单:

second_list = ['a', 'a b c', 'abc zyx', 'ab cc ac']

如何根据整个第一个列表中的元素与第二个列表中单个字符串的任何部分匹配的总次数,创建一个简单地重新排序第二个列表的函数?

为了进一步明确:

  • 在第二个列表中,'a'字符串将包含1个匹配
  • 'a b c'字符串将包含3个匹配
  • 一旦功能完成,第二个示例列表基本上会以相反的顺序结束

我的尝试:

first_list = ['a', 'b', 'c']
second_list = ['a', 'a b c', 'abc zyx', 'ab cc ac']

print second_list

i = 0
for keyword in first_list:
    matches = 0
    for s in second_list:
        matches += s.count(keyword)
        if matches > second_list[0].count(keyword):
            popped = second_list.pop(i)
            second_list.insert(0, popped)

print second_list

3 个答案:

答案 0 :(得分:3)

最直接的方法是使用key parameter of the sorted built-in function

<form runat="server"  defaultbutton="Button1">

对要排序的列表中的每个项目调用一次键函数。由于>>> sorted(second_list, key = lambda s: sum(s.count(x) for x in first_list), reverse=True) ['ab cc ac', 'a b c', 'abc zyx', 'a'] 需要线性时间,因此这仍然是低效的。

答案 1 :(得分:2)

类似的答案:

first_list = ['a', 'b', 'c']    
second_list = ['a', 'a b c', 'abc zyx', 'ab cc ac']

#Find occurrences
list_for_sorting = []
for string in second_list:
    occurrences = 0
    for item in first_list:
        occurrences += string.count(item)

    list_for_sorting.append((occurrences, string))

#Sort list
sorted_by_occurrence = sorted(list_for_sorting, key=lambda tup: tup[0], reverse=True)
final_list = [i[1] for i in sorted_by_occurrence]
print(final_list)

['ab cc ac', 'a b c', 'abc zyx', 'a']

答案 2 :(得分:1)

这是一种不稳定的方法:

>>> l1 = ['a', 'b', 'c']
>>> l2 = ['a', 'a b c', 'abc zyx', 'ab cc ac']
>>> [s for _, s in sorted(((sum(s2.count(s1) for s1 in l1), s2) for s2 in l2), reverse=True)]
['ab cc ac', 'abc zyx', 'a b c', 'a']

如果需要稳定排序,您可以使用enumerate

>>> l1 = ['a', 'b', 'c']
>>> l2 = ['a', 'a b c', 'ccc ccc', 'bb bb bb', 'aa aa aa']
>>> [x[-1] for x in sorted(((sum(s2.count(s1) for s1 in l1), -i, s2) for i, s2 in enumerate(l2)), reverse=True)]
['ccc ccc', 'bb bb bb', 'aa aa aa', 'a b c', 'a']

上面会生成元组,其中第二项是来自l2的字符串,第一项是来自l1的匹配项:

>>> tuples = [(sum(s2.count(s1) for s1 in l1), s2) for s2 in l2]
>>> tuples
[(1, 'a'), (3, 'a b c'), (3, 'abc zyx'), (6, 'ab cc ac')]

然后这些元组按降序排列:

>>> tuples = sorted(tuples, reverse=True)
>>> tuples
[(6, 'ab cc ac'), (3, 'abc zyx'), (3, 'a b c'), (1, 'a')]

最后只拍摄字符串:

>>> [s for _, s in tuples]
['ab cc ac', 'abc zyx', 'a b c', 'a']

在第二个版本中,元组具有反向索引以确保稳定性:

>>> [(sum(s2.count(s1) for s1 in l1), -i, s2) for i, s2 in enumerate(l2)]
[(1, 0, 'a'), (3, -1, 'a b c'), (3, -2, 'abc zyx'), (6, -3, 'ab cc ac')]