Question

让我说我得到以下清单

x = ['A','A','B','A','A','A', 'C', 'C', 'A', 'A']

生成以下输出的最佳和最有效的方法是什么

# key = number of consecutives
# val = number of occurrences
>>> func(x, 'A')
{2:2, 3:1}

>>> func(x, 'B')
{1:1}

>>> func(x, 'C')
{2:1}

我们可以假设列表是所有字符串。有什么想法吗？

Answer 1

以下内容可行，使用collections.Counter和itertools.groupby：

from itertools import groupby
from collections import Counter

def func(lst, elmnt):
    return Counter(len(list(g)) for k, g in groupby(lst) if k == elmnt)

>>> func(x, 'A')
Counter({2: 1, 3: 1})

虽然单个调用可能不会出现这种情况，但最好建立一个中间数据结构，在单个复制中收集所有不同元素组的计数，以便后续调用单个元素不会再次迭代整个列表：

from collections import defaultdict

def func(lst):
    c = Counter((k, len(list(g))) for k, g in groupby(lst))
    d = defaultdict(dict)
    for (k, length), count in c.items():
        d[k][length] = count
    return d.get

>>> f = func(x)  # builds intermediate structure (O(N)), returns function to query it
>>> f('A')  # these calls are now all O(1)
{2: 1, 3: 1}
>>> f('B')
{1: 1}

Answer 2

这应该有效：

%timeit

它是否是最好的完全基于意见。 IMO它是最好的，因为我写了它，并且在我看来，我是最好的。

更客观的说明：@schwobaseggl的解决方案更简洁，但快速的def ExcludeOutlierEllipsoid3D(xi, yi, zi, theta, universalThreshold):实验告诉我，我的示例速度提高了5倍，其他示例可能更多......所以'最好'真的取决于你最重视的东西。（即使'有效'也很模糊：你在考虑处理时间，内存使用情况，......？）

获取某些目标值的连续出现的分布

2 个答案: