懒惰排序功能,timsort

时间:2018-11-16 09:33:34

标签: python python-3.x performance sorting

具有N个(大量)元素的列表:

from random import randint

eles = [randint(0, 10) for i in range(3000000)]

我正在尝试实现以下功能的最佳方法(性能/资源消耗):

def mosty(lst):
    sort = sorted((v, k) for k, v in enumerate(lst))
    count, maxi, last_ele, idxs = 0, 0, None, []
    for ele, idx in sort:
        if(last_ele != ele):
            count = 1
            idxs = []
        idxs.append(idx)
        if(last_ele == ele):
            count += 1
            if(maxi < count):
                results = (ele, count, idxs)
                maxi = count
        last_ele = ele
    return results

此函数返回最常见的元素,出现的次数以及找到它的索引。

这里是benchmark,具有300000个元素。

但是我认为我可以改进,原因之一就是python3 sorted函数(timsort),如果它返回了一个生成器,我就不必两次遍历列表了吗?

我的问题是:

有什么方法可以优化此代码?怎么样?
我肯定会采用惰性排序,对吗?我该如何实施懒惰的timsort

2 个答案:

答案 0 :(得分:2)

没有做任何基准测试,但是应该不会表现不好(即使它在列表中重复两次):

from collections import Counter
from random import randint

eles = [randint(0, 10) for i in range(30)]

counter = Counter(eles)
most_common_element, number_of_occurrences = counter.most_common(1)[0]
indices = [i for i, x in enumerate(eles) if x == most_common_element]

print(most_common_element, number_of_occurrences, indices)

和索引(第二次迭代)可以在生成器表达式中懒洋洋地找到:

indices = (i for i, x in enumerate(eles) if x == most_common_element)

如果您需要关心最常见的多个元素,这可能对您有用:

from collections import Counter
from itertools import groupby
from operator import itemgetter

eles = [1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 5, 5]

counter = Counter(eles)
_key, group = next(groupby(counter.most_common(), key=itemgetter(1)))
most_common = dict(group)
indices = {key: [] for key in most_common}

for i, x in enumerate(eles):
   if x in indices:
        indices[x].append(i)

print(most_common)
print(indices)

您当然仍然可以像上面一样使indices变懒。

答案 1 :(得分:2)

如果您愿意使用numpy,则可以执行以下操作:

arr = np.array(eles)
values, counts = np.unique(arr, return_counts=True)
ind = np.argmax(counts)
most_common_elem, its_count = values[ind], counts[ind]
indices = np.where(arr == most_common_elem)

HTH。