Python - 计算列表中某些范围的出现次数

时间:2012-03-03 06:03:46

标签: python list count range histogram

所以基本上我想计算浮点出现在给定列表中的次数。例如:用户输入等级列表(所有得分均为100),并且它们以十个为一组进行分类。从0-10,10-20,20-30等分数出现多少次?像测试分数一样。我知道我可以使用计数功能,但因为我不是在找特定的数字,所以我遇到了麻烦。有没有结合计数和范围?谢谢你的帮助。

4 个答案:

答案 0 :(得分:7)

要对数据进行分组,请将其除以间隔宽度。要计算每个组中的数字,请考虑使用collections.Counter。这是一个带有文档和测试的实例:

from collections import Counter

def histogram(iterable, low, high, bins):
    '''Count elements from the iterable into evenly spaced bins

        >>> scores = [82, 85, 90, 91, 70, 87, 45]
        >>> histogram(scores, 0, 100, 10)
        [0, 0, 0, 0, 1, 0, 0, 1, 3, 2]

    '''
    step = (high - low + 0.0) / bins
    dist = Counter((float(x) - low) // step for x in iterable)
    return [dist[b] for b in range(bins)]

if __name__ == '__main__':
    import doctest
    print doctest.testmod()

答案 1 :(得分:6)

如果您使用外部库NumPy,则只需拨打numpy.histogram()

>>> data = [82, 85, 90, 91, 70, 87, 45]
>>> counts, bins = numpy.histogram(data, bins=10, range=(0, 100))
>>> counts
array([0, 0, 0, 0, 1, 0, 0, 1, 3, 2])
>>> bins
array([   0.,   10.,   20.,   30.,   40.,   50.,   60.,   70.,   80.,
         90.,  100.])

答案 2 :(得分:4)

decs = [int(x/10) for x in scores]

将分数从0-9映射到> 0,10-19 - > 1,等等。然后只计算0,1,2,3等的出现次数(通过collections.Counter之类的东西),并从那里映射回范围。

答案 3 :(得分:2)

此方法使用bisect可以提高效率,但需要先对分数进行排序。

from bisect import bisect
import random

scores = [random.randint(0,100) for _ in xrange(100)]
bins = [20, 40, 60, 80, 100]

scores.sort()
counts = []
last = 0
for range_max in bins:
    i = bisect(scores, range_max, last)
    counts.append(i - last)
    last = i

我不希望你为此安装numpy,但如果你已经有numpy,你可以使用numpy.histogram

更新

首先,使用bisect更灵活。使用[i//n for i in scores]要求所有垃圾箱大小相同。使用bisect允许bin具有任意限制。同样i//n表示范围是[lo,hi]。使用bisect范围是(lo,hi)但是如果你想要[lo,hi]你可以使用bisect_left。

第二个二分法更快,见下面的时间。我已经用较慢的排序(得分)替换了scores.sort(),因为排序是最慢的步骤,我不想用预先排序的数组来偏置时间,但是OP说他/她的数组已经是在这种情况下排序如此bisect可能会更有意义。

setup="""
from bisect import bisect_left
import random
from collections import Counter

def histogram(iterable, low, high, bins):
    step = (high - low) / bins
    dist = Counter(((x - low + 0.) // step for x in iterable))
    return [dist[b] for b in xrange(bins)]

def histogram_bisect(scores, groups):
    scores = sorted(scores)
    counts = []
    last = 0
    for range_max in groups:
        i = bisect_left(scores, range_max, last)
        counts.append(i - last)
        last = i
    return counts

def histogram_simple(scores, bin_size):
    scores = [i//bin_size for i in scores]
    return [scores.count(i) for i in range(max(scores)+1)]

scores = [random.randint(0,100) for _ in xrange(100)]
bins = range(10, 101, 10)
"""
from timeit import repeat
t = repeat('C = histogram(scores, 0, 100, 10)', setup=setup, number=10000)
print min(t)
#.95
t = repeat('C = histogram_bisect(scores, bins)', setup=setup, number=10000)
print min(t)
#.22
t = repeat('histogram_simple(scores, 10)', setup=setup, number=10000)
print min(t)
#.36