在函数编程风格中具有容量限制的子元中拆分python元组

时间:2016-10-30 16:24:25

标签: python functional-programming itertools

我在python中有一些元组。 例如,容量限制为5。 我想将元组拆分为由元素总和限制的子元组:

例如:

input: (3, 1, 4, 2, 2, 1, 1, 2) and capacity = 5
output: (3, 1) (4) (2, 2, 1) (1, 2) #each subtuple is less than 5, order safe.

我正在寻找这个任务的一个很好的表达解决方案,最好是在编程的功能风格中(例如使用itertools.dropwhile或类似的东西)

10 个答案:

答案 0 :(得分:14)

您可以封装非功能部件并从功能代码中调用它:

from itertools import groupby

class GroupBySum:
    def __init__(self, maxsum):
        self.maxsum = maxsum
        self.index = 0
        self.sum = 0

    def __call__(self, value):
        self.sum += value
        if self.sum > self.maxsum:
            self.index += 1
            self.sum = value
        return self.index

# Example:

for _, l in groupby((3, 1, 4, 2, 2, 1, 1, 2), GroupBySum(5)):
    print(list(l))

答案 1 :(得分:6)

我无法帮助它但是写了一些接近我在Haskell中所做的事情(我认为仍然有些pythonic):

def take_summed(xs, cap):
    if len(xs) <= 1:
        return xs, ()
    else:
        x, *rest = xs

        if x > cap:
            return (), xs
        else:
            init, tail = take_summed(rest, cap - x)
            return (x,) + tuple(init), tail

def split(xs, cap=5):
    if len(xs) <= 1:
        yield xs
    else:
        chunk, rest = take_summed(xs, cap)
        yield chunk

        if rest != ():
            yield from split(rest, cap)

毫不犹豫地将功能分解为子问题。结果:

In [45]: list(split((3, 1, 4, 2, 2, 1, 1, 2), 5))
Out[45]: [(3, 1), (4,), (2, 2, 1), (1, 2)]

使这个更短的问题不在于没有副作用是不可行的,而是你必须携带额外的累积状态,所以即使使用reduce你需要发明一些非常复杂的东西,要通过围绕申请之和。

答案 2 :(得分:4)

这里的方法与@Jean的方法略有不同,它会对输入元组进行切片,而不是使用追加构建较小的列表,并提供一点性能提升:

def group_by_capacity(tup, capacity=5):
    t = iter(tup)
    curr, s =  0, next(t)

    for i, v in enumerate(t, 1):
        if s + v  > capacity:
            yield tup[curr:i]
            curr  = i
            s = v
        else:
            s += v
    yield tup[curr:]
>>> list(group_by_capacity((3, 1, 4, 2, 2, 1, 1, 2)))
[(3, 1), (4,), (2, 2, 1), (1, 2)]

有些时间:

In [35]: from random import randrange

In [36]: start = tuple((randrange(1,5) for _ in range(100000)))

In [37]: %%timeit
   ....: list(group_by_capacity(start))
   ....:
10 loops, best of 3: 47.4 ms per loop

In [38]: %%timeit
   ....: list(generate_tuple(start))
   ....:
10 loops, best of 3: 61.1 ms per loop

答案 3 :(得分:4)

我有点惊讶没有人使用itertools.accumulate关键功能。无论如何,我的条目:

from itertools import groupby, accumulate

def sumgroup(seq, capacity):
    divided = accumulate(enumerate(seq),
                         lambda x,y: (x[0],x[1]+y[1])
                                     if x[1]+y[1] <= capacity else (x[0]+1,y[1]))
    seq_iter = iter(seq)
    grouped = groupby(divided, key=lambda x: x[0])
    return [[next(seq_iter) for _ in g] for _,g in grouped]

有很多变种,例如您可以使用zip(seq, divided)来避免seq_iter等,但这是第一种想到的方式。它给了我

In [105]: seq = [3, 1, 4, 2, 2, 1, 1, 2]

In [106]: sumgroup(seq, 5)
Out[106]: [[3, 1], [4], [2, 2, 1], [1, 2]]

并同意GroupBySum结果:

In [108]: all(sumgroup(p, 5) == [list(l) for _, l in groupby(p, GroupBySum(5))]
     ...:     for width in range(1,8) for p in product(range(1,6), repeat=width))
     ...:     
     ...: 
Out[108]: True

答案 4 :(得分:3)

我正在等待第一个答案提供一个稍微有用的方法:

start = (3, 1, 4, 2, 2, 1, 1, 2)

def generate_tuple(inp):
    current_sum = 0
    current_list = []
    for e in inp:
        if current_sum + e <= 5:
            current_list.append(e)
            current_sum += e
        else:
            if current_list:  # fixes "6" in first position empty tuple bug
                yield tuple(current_list)
            current_list = [e]
            current_sum = e
    yield tuple(current_list)

print([i for i in generate_tuple(start)])

结果:

[(3, 1), (4,), (2, 2, 1), (1, 2)]

编辑:我发现了一种使用记忆效应的全功能方法,否则它是不可行的。它很难看,只有当我想到我将如何清楚地解释它时它才会伤害我。我已经将输入数据集加了一点或者太容易了

start = (6, 7, 3, 1, 4, 2, 2, 1, 1, 2, 3, 1 ,3, 1, 1)

现在的代码。 3行,得到一些阿司匹林,你会像我一样需要它:

mem=[0,0]
start = start + (5,)
print([start[mem[-2]:n] for i in range(0,len(start)) for n in range(i+1,len(start)) if ((n==i+1 and start[i]>=5) or (sum(start[mem[-1]:n])<=5 and sum(start[mem[-1]:n+1])>5)) and not mem.append(n)])

我会尝试解释。

  • 我使用记忆效应,因为没有记忆效应是不可能的。存储在mem并在开始时设置为0,0
  • 由于该函数忽略了最后一项,我修改输入数据以将阈值添加到以前的值不会被删除
  • 唯一简单的事情是计算2个和并检测超过阈值的指数。检测到此阈值时,将满足两个条件并激活第三个条件:将索引存储在mem中。由于append返回None,因此最后一个条件始终为真
  • ((n==i+1 and start[i]>=5)将检测大于或等于5的单个值。
  • 剩下的是一些微调,直到输出与程序方法相同,现在看起来并不那么糟糕:)

答案 5 :(得分:1)

不确定为什么你在元组中都需要它们,但如果你不这样做,你可以删除<!DOCTYPE html> <html lang="en"> <head> <meta charset="utf-8"> <meta http-equiv="X-UA-Compatible" content="IE=edge"> <meta name="viewport" content="width=device-width, initial-scale=1"> <title>Refine Studios - Subtle Refine</title> <link href="css/bootstrap.min.css" rel="stylesheet"> <link href="css/styles.css" rel="stylesheet"> </head> <body> <div id="cssmenu" class="flexChild columnParent flexcanvas"> <div id="MenuContainer" class="flexChild columnParent"> <div id="HeaderTopContainer" class="flexChild"></div> <div id="MenuFooterContainer" class="flexChild rowParent"> <div id="HomeArea" class="flexChild columnParent"> <div id="HomeLink" class="flexChild">Home</div> <div id="HomeBar" class="flexChild bar"></div> </div> <div id="ReportArea" class="flexChild columnParent"> <div id="ReportLink" class="flexChild">Report</div> <div id="ReportBar" class="flexChild bar"></div> </div> <div id="ProductArea" class="flexChild columnParent"> <div id="ProductLink" class="flexChild">Product</div> <div id="ProductBar" class="flexChild bar"></div> </div> <div id="ContactArea" class="flexChild columnParent"> <div id="ContactLink" class="flexChild">Contact</div> <div id="ContactBar" class="flexChild bar"></div> </div> <div id="AboutArea" class="flexChild columnParent"> <div id="AboutLink" class="flexChild">About</div> <div id="AboutBar" class="flexChild bar"></div> </div> </div> </div> <div id="BodyArea" class="flexChild"></div> </div> <script src="https://ajax.googleapis.com/ajax/libs/jquery/1.11.3/jquery.min.js"></script> <script src="js/bootstrap.min.js"></script> <script src="js/script.js"></script> </body> </html>投射:

tuple(...)

一些例子:

def chunkit(tpl, capacity):
    ret = []
    cur = []
    for x in tpl:
        if sum(cur) + x > capacity:
            ret.append(tuple(cur))
            cur = [x]
        else:
            cur.append(x)
    if cur != []:
        ret.append(tuple(cur))

    return tuple(ret)

答案 6 :(得分:1)

不知道这是否具有实用性,但它是我能想到的最接近的:

def groupLimit(iterable, limit):
    i, cSum = 0, 0
    def pred(x):
        nonlocal i, cSum, limit
        i, cSum = (i + 1, x) if (x + cSum) > limit else (i, cSum + x)
        return i if x <= limit else -1
    return (tuple(g) for k, g in itertools.groupby(iterable, pred) if k != -1)

这也将挑选出大于限制的单个值。如果不打算将最后两行更改为:

        return i
    return (tuple(g) for k, g in itertools.groupby(iterable, pred))

示例:

t = (3, 1, 6, 2, 2, 1, 1, 2)
a = groupLimit(t,5)
print(tuple(a))
# version 1 -> ((3, 1), (2, 2, 1), (1, 2))
# version 2 -> ((3, 1), (6,), (2, 2, 1), (1, 2))

答案 7 :(得分:1)

让我们使用itertools

定义powerset
from itertools import chain, combinations

def powerset(lst):
    for subset in chain.from_iterable(combinations(lst, r) for r in range(len(lst)+1)):
        yield subset

然后我们可以在单行

中完成
[subset for subset in powerset(input) if sum(subset)<=capacity]

答案 8 :(得分:1)

更通用的解决方案:

def groupwhile(iterable,predicate,accumulator_function):
    continue_group = False
    iterator = iter(iterable)
    try:
        accumulated = next(iterator)
    except StopIteration:
        return
    current_group = [accumulated]
    for item in iterator:
        continue_group = predicate(accumulated,item)
        if continue_group:
            current_group.append(item)
            accumulated = accumulator_function(accumulated,item)
        else:
            yield current_group
            accumulated = item
            current_group = [item]

    yield current_group

#your case
assert (list(groupwhile(
    (3, 1, 4, 2, 2, 1, 1, 2),
    lambda previous_sum,item: previous_sum + item <= 5,
    lambda previous_sum,item: previous_sum + item,
))) == [[3, 1], [4], [2, 2, 1], [1, 2]]

#equivalent to groupby with key not set
assert (list(groupwhile(
    (3, 1, 4, 2, 2, 1, 1, 2),
    lambda previous_item,item: previous_item == item,
    lambda _,item: item,
))) == [[3], [1], [4], [2, 2], [1, 1], [2]]

#break on duplicates
assert (list(groupwhile(
    (3, 1, 4, 2, 2, 1, 1, 2),
    lambda previous_item,item: previous_item != item,
    lambda _,item: item,
))) == [[3, 1, 4, 2], [2, 1], [1, 2]]

#start new group when the number is one
assert (list(groupwhile(
    (3, 1, 4, 2, 2, 1, 1, 2),
    lambda _,item: item != 1,
    lambda _1,_2: None,
))) == [[3], [1, 4, 2, 2], [1], [1, 2]]

答案 9 :(得分:0)

我的解决方案,不是很干净,但只使用简化:

# int, (int, int, ...) -> ((int, ...), ...)
def grupBySum(capacity, _tuple):

    def  _grupBySum(prev, number):
        counter = prev['counter']
        result = prev['result']
        counter = counter + (number,)
        if sum(counter) > capacity:
            result = result + (counter[:-1],)
            return {'counter': (number,), 'result': result}
        else:
            return {'counter': counter, 'result': result}

result = reduce(_grupBySum, _tuple, {'counter': (), 'result': ()}).values()
return result[1]  + (result[0],)

f = (3, 1, 4, 2, 2, 1, 1, 2)
h = grupBySum(5, f)
print(h) # -> ((3, 1), (4,), (2, 2, 1), (1, 2))