我在python中有一些元组。 例如,容量限制为5。 我想将元组拆分为由元素总和限制的子元组:
例如:
input: (3, 1, 4, 2, 2, 1, 1, 2) and capacity = 5
output: (3, 1) (4) (2, 2, 1) (1, 2) #each subtuple is less than 5, order safe.
我正在寻找这个任务的一个很好的表达解决方案,最好是在编程的功能风格中(例如使用itertools.dropwhile
或类似的东西)
答案 0 :(得分:14)
您可以封装非功能部件并从功能代码中调用它:
from itertools import groupby
class GroupBySum:
def __init__(self, maxsum):
self.maxsum = maxsum
self.index = 0
self.sum = 0
def __call__(self, value):
self.sum += value
if self.sum > self.maxsum:
self.index += 1
self.sum = value
return self.index
# Example:
for _, l in groupby((3, 1, 4, 2, 2, 1, 1, 2), GroupBySum(5)):
print(list(l))
答案 1 :(得分:6)
我无法帮助它但是写了一些接近我在Haskell中所做的事情(我认为仍然有些pythonic):
def take_summed(xs, cap):
if len(xs) <= 1:
return xs, ()
else:
x, *rest = xs
if x > cap:
return (), xs
else:
init, tail = take_summed(rest, cap - x)
return (x,) + tuple(init), tail
def split(xs, cap=5):
if len(xs) <= 1:
yield xs
else:
chunk, rest = take_summed(xs, cap)
yield chunk
if rest != ():
yield from split(rest, cap)
毫不犹豫地将功能分解为子问题。结果:
In [45]: list(split((3, 1, 4, 2, 2, 1, 1, 2), 5))
Out[45]: [(3, 1), (4,), (2, 2, 1), (1, 2)]
使这个更短的问题不在于没有副作用是不可行的,而是你必须携带额外的累积状态,所以即使使用reduce
你需要发明一些非常复杂的东西,要通过围绕申请之和。
答案 2 :(得分:4)
这里的方法与@Jean的方法略有不同,它会对输入元组进行切片,而不是使用追加构建较小的列表,并提供一点性能提升:
def group_by_capacity(tup, capacity=5):
t = iter(tup)
curr, s = 0, next(t)
for i, v in enumerate(t, 1):
if s + v > capacity:
yield tup[curr:i]
curr = i
s = v
else:
s += v
yield tup[curr:]
>>> list(group_by_capacity((3, 1, 4, 2, 2, 1, 1, 2)))
[(3, 1), (4,), (2, 2, 1), (1, 2)]
有些时间:
In [35]: from random import randrange
In [36]: start = tuple((randrange(1,5) for _ in range(100000)))
In [37]: %%timeit
....: list(group_by_capacity(start))
....:
10 loops, best of 3: 47.4 ms per loop
In [38]: %%timeit
....: list(generate_tuple(start))
....:
10 loops, best of 3: 61.1 ms per loop
答案 3 :(得分:4)
我有点惊讶没有人使用itertools.accumulate
关键功能。无论如何,我的条目:
from itertools import groupby, accumulate
def sumgroup(seq, capacity):
divided = accumulate(enumerate(seq),
lambda x,y: (x[0],x[1]+y[1])
if x[1]+y[1] <= capacity else (x[0]+1,y[1]))
seq_iter = iter(seq)
grouped = groupby(divided, key=lambda x: x[0])
return [[next(seq_iter) for _ in g] for _,g in grouped]
有很多变种,例如您可以使用zip(seq, divided)
来避免seq_iter
等,但这是第一种想到的方式。它给了我
In [105]: seq = [3, 1, 4, 2, 2, 1, 1, 2]
In [106]: sumgroup(seq, 5)
Out[106]: [[3, 1], [4], [2, 2, 1], [1, 2]]
并同意GroupBySum
结果:
In [108]: all(sumgroup(p, 5) == [list(l) for _, l in groupby(p, GroupBySum(5))]
...: for width in range(1,8) for p in product(range(1,6), repeat=width))
...:
...:
Out[108]: True
答案 4 :(得分:3)
我正在等待第一个答案提供一个稍微有用的方法:
start = (3, 1, 4, 2, 2, 1, 1, 2)
def generate_tuple(inp):
current_sum = 0
current_list = []
for e in inp:
if current_sum + e <= 5:
current_list.append(e)
current_sum += e
else:
if current_list: # fixes "6" in first position empty tuple bug
yield tuple(current_list)
current_list = [e]
current_sum = e
yield tuple(current_list)
print([i for i in generate_tuple(start)])
结果:
[(3, 1), (4,), (2, 2, 1), (1, 2)]
编辑:我发现了一种使用记忆效应的全功能方法,否则它是不可行的。它很难看,只有当我想到我将如何清楚地解释它时它才会伤害我。我已经将输入数据集加了一点或者太容易了
start = (6, 7, 3, 1, 4, 2, 2, 1, 1, 2, 3, 1 ,3, 1, 1)
现在的代码。 3行,得到一些阿司匹林,你会像我一样需要它:
mem=[0,0]
start = start + (5,)
print([start[mem[-2]:n] for i in range(0,len(start)) for n in range(i+1,len(start)) if ((n==i+1 and start[i]>=5) or (sum(start[mem[-1]:n])<=5 and sum(start[mem[-1]:n+1])>5)) and not mem.append(n)])
我会尝试解释。
mem
并在开始时设置为0,0 mem
中。由于append
返回None
,因此最后一个条件始终为真((n==i+1 and start[i]>=5)
将检测大于或等于5的单个值。答案 5 :(得分:1)
不确定为什么你在元组中都需要它们,但如果你不这样做,你可以删除<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>Refine Studios - Subtle Refine</title>
<link href="css/bootstrap.min.css" rel="stylesheet">
<link href="css/styles.css" rel="stylesheet">
</head>
<body>
<div id="cssmenu" class="flexChild columnParent flexcanvas">
<div id="MenuContainer" class="flexChild columnParent">
<div id="HeaderTopContainer" class="flexChild"></div>
<div id="MenuFooterContainer" class="flexChild rowParent">
<div id="HomeArea" class="flexChild columnParent">
<div id="HomeLink" class="flexChild">Home</div>
<div id="HomeBar" class="flexChild bar"></div>
</div>
<div id="ReportArea" class="flexChild columnParent">
<div id="ReportLink" class="flexChild">Report</div>
<div id="ReportBar" class="flexChild bar"></div>
</div>
<div id="ProductArea" class="flexChild columnParent">
<div id="ProductLink" class="flexChild">Product</div>
<div id="ProductBar" class="flexChild bar"></div>
</div>
<div id="ContactArea" class="flexChild columnParent">
<div id="ContactLink" class="flexChild">Contact</div>
<div id="ContactBar" class="flexChild bar"></div>
</div>
<div id="AboutArea" class="flexChild columnParent">
<div id="AboutLink" class="flexChild">About</div>
<div id="AboutBar" class="flexChild bar"></div>
</div>
</div>
</div>
<div id="BodyArea" class="flexChild"></div>
</div>
<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.11.3/jquery.min.js"></script>
<script src="js/bootstrap.min.js"></script>
<script src="js/script.js"></script>
</body>
</html>
投射:
tuple(...)
一些例子:
def chunkit(tpl, capacity):
ret = []
cur = []
for x in tpl:
if sum(cur) + x > capacity:
ret.append(tuple(cur))
cur = [x]
else:
cur.append(x)
if cur != []:
ret.append(tuple(cur))
return tuple(ret)
答案 6 :(得分:1)
不知道这是否具有实用性,但它是我能想到的最接近的:
def groupLimit(iterable, limit):
i, cSum = 0, 0
def pred(x):
nonlocal i, cSum, limit
i, cSum = (i + 1, x) if (x + cSum) > limit else (i, cSum + x)
return i if x <= limit else -1
return (tuple(g) for k, g in itertools.groupby(iterable, pred) if k != -1)
这也将挑选出大于限制的单个值。如果不打算将最后两行更改为:
return i
return (tuple(g) for k, g in itertools.groupby(iterable, pred))
示例:
t = (3, 1, 6, 2, 2, 1, 1, 2)
a = groupLimit(t,5)
print(tuple(a))
# version 1 -> ((3, 1), (2, 2, 1), (1, 2))
# version 2 -> ((3, 1), (6,), (2, 2, 1), (1, 2))
答案 7 :(得分:1)
让我们使用itertools
from itertools import chain, combinations
def powerset(lst):
for subset in chain.from_iterable(combinations(lst, r) for r in range(len(lst)+1)):
yield subset
然后我们可以在单行
中完成[subset for subset in powerset(input) if sum(subset)<=capacity]
答案 8 :(得分:1)
更通用的解决方案:
def groupwhile(iterable,predicate,accumulator_function):
continue_group = False
iterator = iter(iterable)
try:
accumulated = next(iterator)
except StopIteration:
return
current_group = [accumulated]
for item in iterator:
continue_group = predicate(accumulated,item)
if continue_group:
current_group.append(item)
accumulated = accumulator_function(accumulated,item)
else:
yield current_group
accumulated = item
current_group = [item]
yield current_group
#your case
assert (list(groupwhile(
(3, 1, 4, 2, 2, 1, 1, 2),
lambda previous_sum,item: previous_sum + item <= 5,
lambda previous_sum,item: previous_sum + item,
))) == [[3, 1], [4], [2, 2, 1], [1, 2]]
#equivalent to groupby with key not set
assert (list(groupwhile(
(3, 1, 4, 2, 2, 1, 1, 2),
lambda previous_item,item: previous_item == item,
lambda _,item: item,
))) == [[3], [1], [4], [2, 2], [1, 1], [2]]
#break on duplicates
assert (list(groupwhile(
(3, 1, 4, 2, 2, 1, 1, 2),
lambda previous_item,item: previous_item != item,
lambda _,item: item,
))) == [[3, 1, 4, 2], [2, 1], [1, 2]]
#start new group when the number is one
assert (list(groupwhile(
(3, 1, 4, 2, 2, 1, 1, 2),
lambda _,item: item != 1,
lambda _1,_2: None,
))) == [[3], [1, 4, 2, 2], [1], [1, 2]]
答案 9 :(得分:0)
我的解决方案,不是很干净,但只使用简化:
# int, (int, int, ...) -> ((int, ...), ...)
def grupBySum(capacity, _tuple):
def _grupBySum(prev, number):
counter = prev['counter']
result = prev['result']
counter = counter + (number,)
if sum(counter) > capacity:
result = result + (counter[:-1],)
return {'counter': (number,), 'result': result}
else:
return {'counter': counter, 'result': result}
result = reduce(_grupBySum, _tuple, {'counter': (), 'result': ()}).values()
return result[1] + (result[0],)
f = (3, 1, 4, 2, 2, 1, 1, 2)
h = grupBySum(5, f)
print(h) # -> ((3, 1), (4,), (2, 2, 1), (1, 2))