Question

我想根据范围列表聚集数据列表。我的想法是，我想制作最终结果的直方图。我知道集合.Counter但是没有看到有人使用它或其他内置的产生团块。我写了很长的表格，但我希望有人可以提供更有效的东西。

def min_to_sec(val):
    ret_val = 60 * int(val)
    return ret_val

def hr_to_sec(val):
    ret_val = 3600 * int(val)
    return ret_val

def histogram(y_lst):
    x_lst = [   10,
                20,
                30,
                40,
                50,
                60,
                90,
                min_to_sec(2),
                min_to_sec(3),
                min_to_sec(4),
                min_to_sec(5),
                min_to_sec(10),
                min_to_sec(15),
                min_to_sec(20),
            ]

    results = {}    
    for y_val in y_lst:
        for x_val in x_lst:
            if y_val < x_val:
                results[ str(x_val) ] = results.get( str(x_val), 0) + 1
                break
        else:        
            results['greater'] = results.get('greater', 0) + 1
    return results

已更新，包含所需样本输出的示例：

所以如果我的x_lst和y_list是：

x_lst = [10,20,30,40]
y_lst = [1,2,3,15,22,27,40]

我想要一个类似于Counter的返回值：

{
    10:3,
    20:1,
    30:2,
}

所以虽然我的上面的代码工作，但是它是一个嵌套的for循环，它很慢，我希望有一种方法可以使用像collections.Count这样的'clumping'操作。

Answer 1

您可以使用collections.Counter对列表中的元素进行此类计数：

In [1]: from collections import Counter

In [2]: Counter([1, 2, 10, 1, 2, 100])
Out[2]: Counter({1: 2, 2: 2, 100: 1, 10: 1})

您可以使用以下方式更简单地增加计数器：

results['foo'] += 1

为了只计算不平等之前的那些，你可以使用itertools.takewhile：

In [3]: from itertools import takewhile 

In [4]: Counter(takewhile(lambda x: x < 10, [1, 2, 10, 1, 2, 100]))
Out[4]: Counter({1: 1, 2: 1})

然而，这不会跟踪那些已经打破的人。

Answer 2

你考虑过使用熊猫吗？您可以将y_lst放入DataFrame中并轻松制作直方图。

假设您已导入matplotlib和pylab ...

import pandas as pd
data = pd.DataFrame([1, 2, 3, 15, 22, 27, 40])
data[0].hist(bins = 4)

这会给你上面描述的直方图。但是，一旦数据出现在pandas DataFrame中，根据您的喜好对其进行切片并不是太具挑战性。

使用Python将数据排序成块

2 个答案: