Question

使用以下列表列表（一个大列表中的4个单独列表）

myvariable = [['test', 'xxxx', 'DDDt', 'EEEst', '2323t', 'test', 'test', 'test'], 
              ['test', 'xxxx', 'DDDt', 'EEEst', '2323t', 'test', 'test', 'test'], 
              ['test', 'X1', 'DDDt', 'EEEst', '2323t', 'test', 'test', 'test']
             ]

我需要循环遍历每个列表并检查元素0和元素1是否在任何其他元素中是相同的，如果它们是匹配的那么它应该删除后一个列表（所以在我的例子中它删除了中间列表。

每次从列表中删除项目时，都需要更新列表。

有人有什么想法吗？

Answer 1

使用前两项作为关键字的词典：

>>> lis = [['test', 'xxxx', 'DDDt', 'EEEst', '2323t', 'test', 'test', 'test'], ['test', 'xxxx', 'DDDt', 'EEEst', '2323t', 'test', 'test', 'test'], ['test', 'X1', 'DDDt', 'EEEst', '2323t', 'test', 'test', 'test']]
>>> from collections import OrderedDict
>>> dic = OrderedDict()
>>> for item in lis:
...     key = tuple(item[:2])
...     if key not in dic:
...         dic[key] = item
...         
>>> dic.values()
[
 ['test', 'xxxx', 'DDDt', 'EEEst', '2323t', 'test', 'test', 'test'],
 ['test', 'X1', 'DDDt', 'EEEst', '2323t', 'test', 'test', 'test']
]

Answer 2

使用列表推导和集合来跟踪所看到的内容：

myvariable = [['test', 'xxxx', 'DDDt', 'EEEst', '2323t', 'test', 'test', 'test'], 
              ['test', 'xxxx', 'DDDt', 'EEEst', '2323t', 'test', 'test', 'test'], 
              ['test', 'X1', 'DDDt', 'EEEst', '2323t', 'test', 'test', 'test']
             ]

seen=set()
print [li for li in myvariable 
         if tuple(li[:2]) not in seen and not seen.add(tuple(li[:2]))]

打印：

[['test', 'xxxx', 'DDDt', 'EEEst', '2323t', 'test', 'test', 'test'], 
 ['test', 'X1', 'DDDt', 'EEEst', '2323t', 'test', 'test', 'test']]

由于列表理解按顺序进行，因此维持订单并删除后者的重复项：

>>> lis=[[1,2,1],
...      [3,4,1],
...      [1,2,2],
...      [3,4,2]]
>>> seen=set()
>>> [li for li in lis if tuple(li[:2]) not in seen and not seen.add(tuple(li[:2]))]
[[1, 2, 1], [3, 4, 1]]

不容忽视，这是一个更快的方法：

from collections import OrderedDict  

lis=[[1,2,1],
     [3,4,1],
     [1,2,2],
     [3,4,2]]

def f1(lis):
    seen=set()
    return [li for li in lis 
             if tuple(li[:2]) not in seen and not seen.add(tuple(li[:2]))]       

def f2(lis):
    dic = OrderedDict()
    for item in lis:
        key = tuple(item[:2])
        if key not in dic:
            dic[key] = item

    return dic.values()

if __name__ == '__main__':
    import timeit            
    print 'f1, LC+set:',timeit.timeit("f1(lis)", setup="from __main__ import f1,lis"),'secs'
    print 'f2, OrderedDic:',timeit.timeit("f2(lis)", setup="from __main__ import f2,lis,OrderedDict"),'secs'

打印：

f1, LC+set: 2.81167197227 secs
f2, OrderedDic: 16.4299631119 secs

所以这种方法快了近6倍

Answer 3

此列表理解保留顺序并消除第一个之后的所有重复项。

>>> check = [L[0:2] for L in myvariable]
>>> [el for i, el in enumerate(myvariable) if el[0:2] not in check[:i]]
[['test', 'xxxx', 'DDDt', 'EEEst', '2323t', 'test', 'test', 'test'], 
['test', 'X1', 'DDDt', 'EEEst', '2323t', 'test', 'test', 'test']]

这是一个列表理解和标准的dict解决方案，它可以更好地用于更大的列表。

>>> d={}
>>> [d.setdefault(tuple(el[:2]), el) for el in myvar if tuple(el[:2]) not in d]
[['test', 'xxxx', 'DDDt', 'EEEst', '2323t', 'test', 'test', 'test'], 
['test', 'X1', 'DDDt', 'EEEst', '2323t', 'test', 'test', 'test']]

循环查看列表，检查和删除？

3 个答案: