动态更改groupby键

时间:2014-06-05 12:11:54

标签: python

我需要将排序的概率列表拆分成组。第一组包含来自(0.5,1),第二组(0.25,0.5)等的概率

我已经制作了一些代码,将包含2个小于1的幂的列表分成两个列表:一个列表成员大于0.5,另一个包含(原始)列表成员小于0.5。

from itertools import groupby
from operator import itemgetter
import doctest
N= 10 

twos = [2**(-(i+1)) for i in range(0,N)]

def split_by_prob(items,cutoff):
    """
    (list of double) -> list of (lists) of double
    Splits a set into subsets based on probability
    >>> split_by_prob(twos, 0.5)
    [[0.5], [ 0.25, 0.125, 0.0625, 0.03125, 0.015625, 0.0078125, 0.00390625, 0.001953125, 0.0009765625]]
    """
    groups = []
    keys = []
    for k,g in it.groupby(enumerate(items), lambda (j, x): x<cutoff):
        groups.append((map(itemgetter(1),g)))
    return groups

从命令行调用此代码的确如此:

>>> g = split_into_groups(twos,0.5)
>>> g
[[0.5], [0.25, 0.125, 0.0625, 0.03125, 0.015625, 0.0078125, 0.00390625, 0.001953125, 0.0009765625]]

我的问题:如何更改每次迭代的截止值?即如果我向函数传递一个截止列表(例如cutoffs = [0.5, 0.125, 0.0625]),我会得到一个列表列表,每个列表都将原始列表的相应成员分组到正确的类别中。在这种情况下,返回的组将是类似的[[0.5],[0.25,0125],[0.0625],[0.03125, 0.015625, 0.0078125, 0.00390625, 0.001953125, 0.0009765625]]

2 个答案:

答案 0 :(得分:1)

如果我理解正确的话,你可以使用x < i为每个截止的i迭代一个截止列表。

cutoffs = [0.5, 0.125, 0.0625]
def split_by_prob(items,cutoffs):
    """
    (list of double) -> list of (lists) of double
    Splits a set into subsets based on probability
   # >>> split_by_prob(twos, 0.5)
    [[0.5], [ 0.25, 0.125, 0.0625, 0.03125, 0.015625, 0.0078125, 0.00390625, 0.001953125, 0.0009765625]]
    """
    groups = []
    keys = []

    for i in cutoffs:
        for k,g in groupby(enumerate(items), lambda (j, x): x < i):
            groups.append((map(itemgetter(1),g)))
    return groups

print split_by_prob(twos, cutoffs)


 [0.5], [0.25, 0.125, 0.0625, 0.03125, 0.015625, 0.0078125, 0.00390625, 0.001953125, 0.0009765625], [0.5, 0.25, 0.125], [0.0625, 0.03125, 0.015625, 0.0078125, 0.00390625, 0.001953125, 0.0009765625], [0.5, 0.25, 0.125, 0.0625], [0.03125, 0.015625, 0.0078125, 0.00390625, 0.001953125, 0.0009765625]

答案 1 :(得分:0)

我已经弄清楚我需要做什么,完整的代码如下。我不确定它的效率或pythonic如何:

import numpy as np
from itertools import groupby
from operator import itemgetter
import doctest
N= 10 

twos = [2**(-(i+1)) for i in range(0,N)]
cutoffs = [0.5, 0.125, 0.03125]

def split_by_prob(items,cutoff,groups):
    """
    (list of double) -> list of (lists) of double
    Splits a set into subsets based on probability
    >>> split_by_prob(twos, 0.5)
    [[0.5], [ 0.25, 0.125, 0.0625, 0.03125, 0.015625, 0.0078125, 0.00390625, 0.001953125, 0.0009765625]]
    """
    for k,g in groupby(enumerate(items), lambda (j, x): x<cutoff):
        groups.append((map(itemgetter(1),g)))
    return groups

def split_into_groups(items, cutoffs):
    """
    (list of double) -> list of (lists) of double
    Splits a set into subsets based on probability
    >>> split_by_prob(twos, cutoffs)
    [[0.5], [0.25, 0.125], [0.0625, 0.03125], [0.015625, 0.0078125, 0.00390625, 0.001953125, 0.0009765625]]
    """
    groups = items
    final = []
    for i in cutoffs:
        groups = split_by_prob(groups,i,[])
        final.append(groups[0])
        groups = groups.pop()
    final.append(groups)
    return final