将数据分组为满足特定条件的指定间隔

时间:2012-10-28 16:13:25

标签: python group-by itertools

我想将此列表中的项目排序到新列表......

truc = [['12', 'brett', 5548],
       ['22.3', 'troy', 9514],
       ['8.1', 'hings', 12635],
       ['34.2', 'dab', 17666],
       ['4q3', 'sigma', 18065],
       ['4q3', 'delta', 18068]]

...使用最后一个字段将它们分组到3500的容器中 所以,理想的结果是:

firstSort = [['34.2', 'dab', 17666],
            ['4q3', 'sigma', 18065],
            ['4q3', 'delta', 18068]]

secondSort = [['22.3', 'troy', 9514],
             ['8.1', 'hings', 12635]]

lastSort = ['12', 'brett', 5548]

我尝试使用itertools.groupby()函数,但我无法找到指定bin大小的方法。

3 个答案:

答案 0 :(得分:3)

如果没有itertools,这是微不足道的。

truc = [['12', 'brett', 5548],
       ['22.3', 'troy', 9514],
       ['8.1', 'hings', 12635],
       ['34.2', 'dab', 17666],
       ['4q3', 'sigma', 18065],
       ['4q3', 'delta', 18068]]

truc.sort(key=lambda a:a[-1])
groups = [[]]
last_row = None
for row in truc:
    if last_row is not None and row[-1] - last_row[-1] > 3500:
        groups.append([])
    last_row = row
    groups[-1].append(row)

import pprint
pprint.pprint(groups)

输出:

[[['12', 'brett', 5548]],
 [['22.3', 'troy', 9514], ['8.1', 'hings', 12635]],
 [['34.2', 'dab', 17666], ['4q3', 'sigma', 18065], ['4q3', 'delta', 18068]]]

答案 1 :(得分:1)

groupby的基本合并:

from itertools import groupby
from math import floor

# data must be sorted

data = [ ['12', 'brett', 5548],
       ['22.3', 'troy', 9514],
       ['8.1', 'hings', 12635],
       ['34.2', 'dab', 17666],
       ['4q3', 'sigma', 18065],
       ['4q3', 'delta', 18068] ]

groups = []
for k, g in groupby(data, lambda x: floor(x[-1]/3500)):
    groups.append(list(g))

print groups

返回:

[
    [
        ['12', 'brett', 5548]
    ],
    [
        ['22.3', 'troy', 9514]
    ],
    [
        ['8.1', 'hings', 12635]
    ],
    [
        ['34.2', 'dab', 17666],
        ['4q3', 'sigma', 18065],
        ['4q3', 'delta', 18068]
    ]
]

然后,当一组中的最大值减去组中的最小值之后,您可以合并组,然后结果小于3500.然后,您将获得

[
    [
        ['12', 'brett', 5548]
    ],
    [
        ['22.3', 'troy', 9514],
        ['8.1', 'hings', 12635]
    ],
    [
        ['34.2', 'dab', 17666],
        ['4q3', 'sigma', 18065],
        ['4q3', 'delta', 18068]
    ]
]

即使在groupby之后合并,我认为Anurag Uniyal的解决方案在平均情况下仍会实现更好的分组。

答案 2 :(得分:0)

使用defaultdict()

lis=[['12', 'brett', 5548],
      ['22.3', 'troy', 9514],
      ['8.1', 'hings', 12635],
      ['34.2', 'dab', 17666],
      ['4q3', 'sigma', 18065],
      ['4q3', 'delta', 18068]]

from collections import defaultdict
d=defaultdict(list)
for i,x in enumerate(lis):
    not_append=True
    for y in d:
        for z in d[y]:
            if abs(z[-1]-x[-1])<=3500:
                d[y].append(x)
                not_append=False
                break
    else:
        if not_append:
            d[i].append(x)
print d.values()

<强>输出:

[[['12', 'brett', 5548]],
 [['22.3', 'troy', 9514], ['8.1', 'hings', 12635]], 
 [['34.2', 'dab', 17666], ['4q3', 'sigma', 18065], ['4q3', 'delta', 18068]]
]