Python元组重复数据删除列表

时间:2013-08-07 21:06:51

标签: python list tuples deduplication

我试图一个接一个地重复删除一组不同的元组列表。列表如下:

A = [
     (('X','Y','Z',2,3,4), ('A','B','C',5,10,11)),
     (('A','B','C',5,10,11), ('X','Y','Z',2,3,4)),
     (('T','F','J',0,1,0), ('H','G','K',2,8,7)),
     ...                                          ]

B = [
     (('X','Y','Z',0,0,0), ('A','B','C',3,3,2)),
     (('A','B','C',3,3,2), ('X','Y','Z',0,0,0)),
     (('J','K','L',5,4,3), ('V','T','D',5,10,12)),
     ...                                          ]

我正在跑步(例如列表A):

from collections import OrderedDict
values = [[x,y] for x, y in OrderedDict.fromkeys(frozenset(x) for x in A)]

我会得到:

 A = [
     (('X','Y','Z',2,3,4), ('A','B','C',5,10,11)),
     (('T','F','J',0,1,0), ('H','G','K',2,8,7)),
     ...                                         ]

但是,如果我重复B,我可能会选择第二个元组而不是第一个:

B = [
     (('A','B','C',3,3,2), ('X','Y','Z',0,0,0)),
     (('J','K','L',5,4,3), ('V','T','D',5,10,12)),
     ...                                         ]

理想情况下B应该是:

B = [
     (('X','Y','Z',0,0,0), ('A','B','C',3,3,2)),
     (('J','K','L',5,4,3), ('V','T','D',5,10,12)),
     ...                                          ] 

我需要它们对于字符串序列是相同的,因为我将使用它们来获得A,B等浮点数的串联。 我很高兴知道是否有办法让重复数据删除列表的选择方法保持不变。谢谢!

1 个答案:

答案 0 :(得分:2)

要维持先前的顺序,请迭代对并跟踪您所看到的内容。仅包含尚未见过的元素:

def dedup(lst):
    seen = set()
    result = []
    for item in lst:
        fs = frozenset(item)
        if fs not in seen:
            result.append(item)
            seen.add(fs)
    return result

示例:

>>> A = [
...      (('X','Y','Z',2,3,4), ('A','B','C',5,10,11)),
...      (('A','B','C',5,10,11), ('X','Y','Z',2,3,4)),
...      (('T','F','J',0,1,0), ('H','G','K',2,8,7)),
...     ]
>>> pprint.pprint(dedup(A))
[(('X', 'Y', 'Z', 2, 3, 4), ('A', 'B', 'C', 5, 10, 11)),
 (('T', 'F', 'J', 0, 1, 0), ('H', 'G', 'K', 2, 8, 7))]
>>> B = [
...      (('X','Y','Z',0,0,0), ('A','B','C',3,3,2)),
...      (('A','B','C',3,3,2), ('X','Y','Z',0,0,0)),
...      (('J','K','L',5,4,3), ('V','T','D',5,10,12)),
...     ]
>>> pprint.pprint(dedup(B))
[(('X', 'Y', 'Z', 0, 0, 0), ('A', 'B', 'C', 3, 3, 2)),
 (('J', 'K', 'L', 5, 4, 3), ('V', 'T', 'D', 5, 10, 12))]