如果订单很重要,如何从元组列表中删除重复

时间:2017-11-12 08:33:00

标签: python list duplicates tuples

我已经看到了一些类似的答案,但我无法找到针对此案例的具体内容。

我有一个元组列表:

[(5, 0), (3, 1), (3, 2), (5, 3), (6, 4)]

我想要的是只有当元组的第一个元素先前出现在列表中并且剩下的元组应该具有最小的第二个元素时才从该列表中删除元组。

所以输出应该如下所示:

[(5, 0), (3, 1), (6, 4)]

6 个答案:

答案 0 :(得分:3)

这是一种线性时间方法,需要对原始列表进行两次迭代。

t = [(5, 0), (3, 1), (3, 2), (5, 3), (6, 4)] # test case 1
#t = [(5, 3), (3, 1), (3, 2), (5, 0), (6, 4)] # test case 2
smallest = {}
inf = float('inf')

for first, second in t:
    if smallest.get(first, inf) > second:
        smallest[first] = second

result = []
seen = set()

for first, second in t:
    if first not in seen and second == smallest[first]:
        seen.add(first)
        result.append((first, second))

print(result) # [(5, 0), (3, 1), (6, 4)] for test case 1
              # [(3, 1), (5, 0), (6, 4)] for test case 2

答案 1 :(得分:2)

这是我使用OrderedDict创建的紧凑版本,如果新值大于旧版本,则跳过替换。

from collections import OrderedDict

a = [(5, 3), (3, 1), (3, 2), (5, 0), (6, 4)]
d = OrderedDict()

for item in a:

    # Get old value in dictionary if exist
    old = d.get(item[0])

    # Skip if new item is larger than old
    if old:
        if item[1] > old[1]:
            continue
        #else:
        #    del d[item[0]]

    # Assign
    d[item[0]] = item

list(d.values())

返回:

[(5, 0), (3, 1), (6, 4)]

或者如果您使用else语句(注释掉):

[(3, 1), (5, 0), (6, 4)]

答案 2 :(得分:1)

在我看来,你需要知道两件事:

  1. 每个第一个元素具有最小第二个元素的元组。
  2. 索引新列表中每个第一个元素的顺序
  3. 我们可以使用itertools.groupbymin函数获得#1。

    import itertools
    import operator
    
    lst = [(3, 1), (5, 3), (5, 0), (3, 2), (6, 4)]
    # I changed this slightly to make it harder to accidentally succeed.
    # correct final order should be [(3, 1), (5, 0), (6, 4)]
    
    tmplst = sorted(lst, key=operator.itemgetter(0))
    groups = itertools.groupby(tmplst, operator.itemgetter(0))
    # group by first element, in this case this looks like:
    # [(3, [(3, 1), (3, 2)]), (5, [(5, 3), (5, 0)]), (6, [(6, 4)])]
    # note that groupby only works on sorted lists, so we need to sort this first
    
    min_tuples = {min(v, key=operator.itemgetter(1)) for _, v in groups}
    # give the best possible result for each first tuple. In this case:
    # {(3, 1), (5, 0), (6, 4)}
    # (note that this is a set comprehension for faster lookups later.
    

    现在我们知道结果集的样子了,我们可以重新解决lst以使它们按正确的顺序排列。

    seen = set()
    result = []
    for el in lst:
        if el not in min_tuples:  # don't add to result
            continue
        elif el not in seen:      # add to result and mark as seen
            result.append(el)
            seen.add(el)
    

答案 3 :(得分:0)

这将满足您的需求:

# I switched (5, 3) and (5, 0) to demonstrate sorting capabilities.
list_a = [(5, 3), (3, 1), (3, 2), (5, 0), (6, 4)]

# Create a list to contain the results
list_b = []

# Create a list to check for duplicates
l = []

# Sort list_a by the second element of each tuple to ensure the smallest numbers
list_a.sort(key=lambda i: i[1])

# Iterate through every tuple in list_a
for i in list_a:

    # Check if the 0th element of the tuple is in the duplicates list; if not:
    if i[0] not in l:

        # Add the tuple the loop is currently on to the results; and
        list_b.append(i)

        # Add the 0th element of the tuple to the duplicates list
        l.append(i[0])

>>> print(list_b)
[(5, 0), (3, 1), (6, 4)]

希望这有帮助!

答案 4 :(得分:0)

使用enumerate()和列表理解:

def remove_if_first_index(l):
    return [item for index, item in enumerate(l) if item[0] not in [value[0] for value in l[0:index]]]

使用enumerate()和for循环:

def remove_if_first_index(l):

    # The list to store the return value
    ret = []

    # Get the each index and item from the list passed
    for index, item in enumerate(l):

        # Get the first number in each tuple up to the index we're currently at
        previous_values = [value[0] for value in l[0:index]]

        # If the item's first number is not in the list of previously encountered first numbers
        if item[0] not in previous_values:
            # Append it to the return list
            ret.append(item)

    return ret

测试

some_list = [(5, 0), (3, 1), (3, 2), (5, 3), (6, 4)]
print(remove_if_first_index(some_list))
# [(5, 0), (3, 1), (6, 4)]

答案 5 :(得分:0)

我没有看到@Anton vBR的答案就有了这个想法。

import collections

inp = [(5, 0), (3, 1), (3, 2), (5, 3), (6, 4)]

od = collections.OrderedDict()
for i1, i2 in inp:
    if i2 <= od.get(i1, i2):
        od.pop(i1, None)
        od[i1] = i2
outp = list(od.items())
print(outp)