删除python列表中的重复项但记住索引

时间:2016-01-02 19:43:36

标签: python list python-2.7

如何删除列表中的重复项,保留项目的原始顺序并记住列表中任何项目的第一个索引?

例如,从[1, 1, 2, 3]中移除重复项会产生[1, 2, 3],但我需要记住索引[0, 2, 3]

我正在使用Python 2.7。

2 个答案:

答案 0 :(得分:5)

我会稍微区别对待并使用OrderedDict,并且列表index方法将返回项目的最低索引。

>>> from collections import OrderedDict
>>> lst = [1, 1, 2, 3]
>>> d = OrderedDict((x, lst.index(x)) for x in lst)
>>> d
OrderedDict([(1, 0), (2, 2), (3, 3)]

如果你需要列表(删除了重复项)和索引,你可以简单地发出:

>>> d.keys()
[1, 2, 3]
>>> d.values()
[0, 2, 3]

答案 1 :(得分:3)

使用enumerate跟踪索引和跟踪元素的跟踪:

l = [1, 1, 2, 3]
inds = []
seen = set()
for i, ele in enumerate(l):
    if ele not in seen:
        inds.append(i)
    seen.add(ele)

如果你们俩想要:

inds = []
seen = set()
for i, ele in enumerate(l):
    if ele not in seen:
        inds.append((i,ele))
    seen.add(ele)

或者如果你想要两个在不同的列表中:

l = [1, 1, 2, 3]
inds, unq = [],[]
seen = set()
for i, ele in enumerate(l):
    if ele not in seen:
        inds.append(i)
        unq.append(ele)
    seen.add(ele)

使用套装是迄今为止最好的方法:

In [13]: l = [randint(1,10000) for _ in range(10000)]     

In [14]: %%timeit                                         
inds = []
seen = set()
for i, ele in enumerate(l):
    if ele not in seen:
        inds.append((i,ele))
    seen.add(ele)
   ....: 
100 loops, best of 3: 3.08 ms per loop

In [15]: timeit  OrderedDict((x, l.index(x)) for x in l)
1 loops, best of 3: 442 ms per loop

In [16]: l = [randint(1,10000) for _ in range(100000)]      
In [17]: timeit  OrderedDict((x, l.index(x)) for x in l)
1 loops, best of 3: 10.3 s per loop

In [18]: %%timeit                                       
inds = []
seen = set()
for i, ele in enumerate(l):
    if ele not in seen:
        inds.append((i,ele))
    seen.add(ele)
   ....: 
10 loops, best of 3: 22.6 ms per loop

因此,对于100k元素10.3秒vs 22.6 ms,如果您尝试使用较少的dupes(例如[randint(1,100000) for _ in range(100000)]),那么您将有时间阅读一本书。创建两个列表的速度略慢,但仍比使用list.index快几个数量级。

如果您想一次获得一个值,您可以使用生成器函数:

def yield_un(l):
    seen = set()
    for i, ele in enumerate(l):
        if ele not in seen:
            yield (i,ele)
        seen.add(ele)