Question

我正在创建一个程序，找到在数组中重复的项目，我确实让它使用if命令工作，但是对于长数组，if命令会减慢程序，所以我希望将它修改为try /除了命令。

这是我的代码：

array = ['11', '5', '4', 'hello', '11', '7', 'a', '4']
seen = []
repeats = []

for item in array:
    if item not in seen:
        seen.append(item)
    else:
        repeats.append(item)

print(repeats)

输出：

[11,4]

感谢

Answer 1

每次插入时，您都会在seen进行线性搜索，这会间隔地增加计算时间。

最简单的方法是使用字典键。可以通过密钥以有效的方式查找字典，因为它们是内部哈希映射。

array = ['11', '5', '4', 'hello', '11', '7', 'a', '4']
seen = {}
repeats = []

for item in array:
    if item not in seen:
        seen[item] = None
    else:
        repeats.append(item)

print(repeats)

正如一位评论者所指出的，一般来说，最常做的方式可能是set而不是dict，因为你不需要dict的价值 - 但我认为值得知道两者都应该给你相同的表现：

array = ['11', '5', '4', 'hello', '11', '7', 'a', '4']
seen = set()
repeats = []

for item in array:
    if item not in seen:
        seen.add(item)
    else:
        repeats.append(item)

print(repeats)

Answer 2

尝试/除外不适用于此问题。

只需使用集合模块中的Counter类。

from collections import Counter
array = ['11', '5', '4', 'hello', '11', '7', 'a', '4']

repeats = [key for key,value in Counter(array).items() if value>1]

print (repeats) #will print ['11','4']

如果您不想使用Counter类，可以使用Michel's solution。

我测试了性能，结果如下：

For n=10^3
Using Counter:          0.0011416s
Using sets and if-else: 0.0006266s

For n=5*10^3
Using Counter:          0.0024912s
Using sets and if-else: 0.0027905s

For n=10^5
Using Counter:          0.0041075s
Using sets and if-else: 0.0054351s

For n=10^6
Using Counter:          0.0333123s
Using sets and if-else: 0.0513704s

<强>更新：这可能与OP的问题无关，而是将其添加到未来的观众中。

我根据元素的独特性再次测试。我不得不说，对于大小为n＆lt;的列表，使用集合和if-else比Counter更好。 10 ^ 4。

对于大于10 ^ 5的尺寸，在50％冗余的情况下，计数器仅优于~0.01s的余量。但是，如果冗余度大约为70-80％，则Counter优于~0.02s（同样，n为10 ^ 5或更大的数量级）。唯一性百分比= len(set(my_list))/len(my_list)。

Answer 3

您还可以使用python的内置Counter编写一个非常好的实现，它将非常有效地为每个值提供计数，并且具有用于执行计数器减法的良好语义。

from collections import Counter
a = Counter(array)
# a == Counter({'11': 2, '4': 2, 'a': 1, '5': 1, '7': 1, 'hello': 1})
seen = a.keys()
# seen = ['11', 'a', '5', '4', '7', 'hello'] 
b = Counter(a.keys())
# b == Counter({'11': 1, 'a': 1, '5': 1, '4': 1, '7': 1, 'hello': 1})
# here is the neat trick because Counter subtraction will remove
# any keys that end up with count 0
repeat = (a - b)
# repeat == Counter({'11': 1, '4': 1})
remaining = repeat.keys()
# [ '11', '4' ]

使用'try / except'而不是'if'来查找列表中的重复项

3 个答案: