具有N个(大量)元素的列表:
from random import randint
eles = [randint(0, 10) for i in range(3000000)]
我正在尝试实现以下功能的最佳方法(性能/资源消耗):
def mosty(lst):
sort = sorted((v, k) for k, v in enumerate(lst))
count, maxi, last_ele, idxs = 0, 0, None, []
for ele, idx in sort:
if(last_ele != ele):
count = 1
idxs = []
idxs.append(idx)
if(last_ele == ele):
count += 1
if(maxi < count):
results = (ele, count, idxs)
maxi = count
last_ele = ele
return results
此函数返回最常见的元素,出现的次数以及找到它的索引。
这里是benchmark,具有300000个元素。
但是我认为我可以改进,原因之一就是python3 sorted
函数(timsort),如果它返回了一个生成器,我就不必两次遍历列表了吗?>
我的问题是:
有什么方法可以优化此代码?怎么样?
我肯定会采用惰性排序,对吗?我该如何实施懒惰的timsort
答案 0 :(得分:2)
没有做任何基准测试,但是应该不会表现不好(即使它在列表中重复两次):
from collections import Counter
from random import randint
eles = [randint(0, 10) for i in range(30)]
counter = Counter(eles)
most_common_element, number_of_occurrences = counter.most_common(1)[0]
indices = [i for i, x in enumerate(eles) if x == most_common_element]
print(most_common_element, number_of_occurrences, indices)
和索引(第二次迭代)可以在生成器表达式中懒洋洋地找到:
indices = (i for i, x in enumerate(eles) if x == most_common_element)
如果您需要关心最常见的多个元素,这可能对您有用:
from collections import Counter
from itertools import groupby
from operator import itemgetter
eles = [1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 5, 5]
counter = Counter(eles)
_key, group = next(groupby(counter.most_common(), key=itemgetter(1)))
most_common = dict(group)
indices = {key: [] for key in most_common}
for i, x in enumerate(eles):
if x in indices:
indices[x].append(i)
print(most_common)
print(indices)
您当然仍然可以像上面一样使indices
变懒。
答案 1 :(得分:2)
如果您愿意使用numpy,则可以执行以下操作:
arr = np.array(eles)
values, counts = np.unique(arr, return_counts=True)
ind = np.argmax(counts)
most_common_elem, its_count = values[ind], counts[ind]
indices = np.where(arr == most_common_elem)
HTH。