Question

具有N个（大量）元素的列表：

from random import randint

eles = [randint(0, 10) for i in range(3000000)]

我正在尝试实现以下功能的最佳方法（性能/资源消耗）：

def mosty(lst):
    sort = sorted((v, k) for k, v in enumerate(lst))
    count, maxi, last_ele, idxs = 0, 0, None, []
    for ele, idx in sort:
        if(last_ele != ele):
            count = 1
            idxs = []
        idxs.append(idx)
        if(last_ele == ele):
            count += 1
            if(maxi < count):
                results = (ele, count, idxs)
                maxi = count
        last_ele = ele
    return results

此函数返回最常见的元素，出现的次数以及找到它的索引。

这里是benchmark，具有300000个元素。

但是我认为我可以改进，原因之一就是python3 sorted函数（timsort），如果它返回了一个生成器，我就不必两次遍历列表了吗？

我的问题是：

有什么方法可以优化此代码？怎么样？
我肯定会采用惰性排序，对吗？我该如何实施懒惰的timsort

Answer 1

没有做任何基准测试，但是应该不会表现不好（即使它在列表中重复两次）：

from collections import Counter
from random import randint

eles = [randint(0, 10) for i in range(30)]

counter = Counter(eles)
most_common_element, number_of_occurrences = counter.most_common(1)[0]
indices = [i for i, x in enumerate(eles) if x == most_common_element]

print(most_common_element, number_of_occurrences, indices)

和索引（第二次迭代）可以在生成器表达式中懒洋洋地找到：

indices = (i for i, x in enumerate(eles) if x == most_common_element)

如果您需要关心最常见的多个元素，这可能对您有用：

from collections import Counter
from itertools import groupby
from operator import itemgetter

eles = [1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 5, 5]

counter = Counter(eles)
_key, group = next(groupby(counter.most_common(), key=itemgetter(1)))
most_common = dict(group)
indices = {key: [] for key in most_common}

for i, x in enumerate(eles):
   if x in indices:
        indices[x].append(i)

print(most_common)
print(indices)

您当然仍然可以像上面一样使indices变懒。

Answer 2

如果您愿意使用numpy，则可以执行以下操作：

arr = np.array(eles)
values, counts = np.unique(arr, return_counts=True)
ind = np.argmax(counts)
most_common_elem, its_count = values[ind], counts[ind]
indices = np.where(arr == most_common_elem)

HTH。

懒惰排序功能，timsort

2 个答案: