我有一个具有以下结构的数组:
[('path1', 10), ('path2', 12), ('path3', 10), ('path4', 7), ('path5', 18)]
我想基于第二个参数的总和将该数组拆分为数组数组。我希望有一个名为max_size
的变量,当max_size
大于20时,它应该拆分并开始下一个列表。结果应该是这样的:
[(('path1', 10)), (('path2', 12)), (('path3', 10), ('path4', 7)), (('path5', 18))]
我是如何在python中做到的?我试着写一些类似的东西:
max_size = 0
for i, obj in enumerate(temp):
dfs = []
for j, obj in enumerate(temp):
if(max_size < 100):
max_size = size + obj[1]
dfs.append(pd.read_csv(obj[0]))
temp.remove(obj)
print obj[0]
else:
break;
print i
print "###" * 10
grouped.append(dfs)
但它没有用,而且我被卡住了。
最好的做法是什么?
答案 0 :(得分:2)
你可以这样做:
def group_by_sum(data, max_value):
sum = 0
start = 0
result = []
for i, t in enumerate(data):
if sum + t[1] > max_value:
result.append(data[start:i])
start = i
sum = 0
sum += t[1]
result.append(data[start:])
return result
# Example
data = [('path1', 10), ('path2', 12), ('path3', 10), ('path4', 7), ('path5', 18)]
result = group_by_sum(data, 20)
print (result)
上查看它
答案 1 :(得分:1)
不使用任何库,您可以使用以下方法执行此操作:
def group_threshold(data,max_size):
result = []
cur_size = 0
cur_straight = []
for datum in data:
_,size = datum
if cur_size+size > max_size:
result.append(tuple(cur_straight))
cur_straight = []
cur_size = 0
cur_size += size
cur_straight.append(datum)
result.append(tuple(cur_straight))
return result
然后你可以用:
来调用它>>> data = [('path1', 10), ('path2', 12), ('path3', 10), ('path4', 7), ('path5', 18)] # the original data
>>> max_size = 20 # the size threshold
>>> group_threshold(data,max_size)
[(('path1', 10),), (('path2', 12),), (('path3', 10), ('path4', 7)), (('path5', 18),)]