删除其他列表中的重复列表

时间:2016-02-17 11:22:14

标签: python

t = [[a, b], [c, d], [a, e], [f, g], [c, d]]

如何获得唯一的列表列表,以便输出等于:

output = [[a, b], [c, d], [a, e], [f, g]]

[c,d]存在两次,因此需要删除。 [a,b]和[a,e]是唯一的列表,无论重复的'a'。

谢谢!

5 个答案:

答案 0 :(得分:3)

OrderedDict将保留顺序,并在我们将子列表映射到元组以使其可以清除时为您提供唯一元素,使用t[:] wil允许我们改变原始对象/列表。

t = [["a", "b"], ["c", "d"], ["a", "e"], ["f", "g"], ["c", "d"]]

from collections import OrderedDict

t[:] = map(list, OrderedDict.fromkeys(map(tuple, t)))

print(t)
[['a', 'b'], ['c', 'd'], ['a', 'e'], ['g', 'f']]

对于python2,如果要避免创建中间列表,可以使用itertools.imap

from collections import OrderedDict
from itertools import imap

t[:] = imap(list, OrderedDict.fromkeys(imap(tuple, t)))

print(t)

您还可以使用set.add or逻辑:

st = set()

t[:] = (st.add(tuple(sub)) or sub for sub in t if tuple(sub) not in st)

print(t)

哪种方法最快:

In [9]: t = [[randint(1,1000),randint(1,1000)] for _ in range(10000)]

In [10]: %%timeit                                                     
st = set()
[st.add(tuple(sub)) or sub for sub in t if tuple(sub) not in st]
   ....: 
100 loops, best of 3: 5.8 ms per loop

In [11]: timeit list(map(list, OrderedDict.fromkeys(map(tuple, t))))  
10 loops, best of 3: 24.1 ms per loop

如果["a","e"]被认为与["e","a"]相同,您可以使用冻结集:

t = [["a", "b"], ["c", "d"], ["a", "e"], ["f", "g"], ["c", "d"], ["e","a"]]
st = set()
t[:] = (st.add(frozenset(sub)) or sub for sub in t if frozenset(sub) not in st)

print(t)

输出:

[['a', 'b'], ['c', 'd'], ['a', 'e'], ['f', 'g']]

为了避免两次调用元组,你可以创建一个函数:

def unique(l):
    st, it = set(), iter(l)
    for tup in map(tuple, l):
        if tup not in st:
            yield next(it)
        else:
            next(it)
        st.add(tup)

哪个运行得快一点:

In [21]: timeit list(unique(t))
100 loops, best of 3: 5.06 ms per loop

答案 1 :(得分:2)

一个简单的解决方案

t = [["a", "b"], ["c", "d"], ["a", "e"], ["f", "g"], ["c", "d"]]
output = []

for elem in t:
    if not elem in output:
        output.append(elem)

print output

输出

[['a', 'b'], ['c', 'd'], ['a', 'e'], ['f', 'g']]

答案 2 :(得分:0)

您可以使用set执行此操作(如果内部列表的顺序无关紧要):

>>> t = [['a', 'b'], ['c', 'd'], ['a', 'e'], ['f', 'g'], ['c', 'd']]
>>> as_tuples = [tuple(l) for l in t]
>>> set(as_tuples)
{('a', 'b'), ('a', 'e'), ('c', 'd'), ('f', 'g')}

答案 3 :(得分:0)

一种简单的方法,假设您不想创建新列表并最小化分配。

# Assumption; nested_lst contains only lists with simple values (floats, int, bool)
def squashDups( nested_lst ):
    ref_set = set()
    new_nested_lst = []
    for lst in nested_lst:
        tup = tuple(lst)
        if tup not in ref_set:
            new_nested_lst.append(lst)
            ref_set.add(tup)
    return new_nested_lst

>>> lst = [ [1,2], [3,4], [3,4], [1,2], [True,False], [False,True], [True,False] ]
>>> squashDups(lst)
[[1, 2], [3, 4], [True, False], [False, True]]

答案 4 :(得分:-1)

如果你关心订单,这应该有效:

t = [["a", "b"], ["c", "d"], ["a", "e"], ["f", "g"], ["c", "d"]]
i = len(t) - 1
while i >= 0:
    if t.count(t[i]) > 1:
        t.pop(i)
    i -= 1
print(t)