我需要解决重复值计算问题,然后删除重复项。
我需要最有效的模式来汇总元组内部的值,然后删除使用过的重复项。
(name, id, age, count)
facts = [('john', 1, 22, 1),('smit', 2, 17, 1),('john', 1, 22, 2),('nick', 3, 43, 1),('john', 1, 22, 1)]
from operator import itemgetter
def sum_and_sort_facts(self, facts:list):
if len(facts) <= 1:
return facts
buffer_list = []
for i, f in enumerate(facts):
if buffer_list and f[1] in [x[1] for x in buffer_list]:
continue
for ic, fc in enumerate(facts):
if i == ic:
continue
if f[1] == fc[1]:
buffer_list.append((f[0], f[1], f[2], f[3] + fc[3]))
buffer_list.append(f)
return sorted(buffer_list, key=itemgetter(3), reverse=True)
我想得到: 事实= [('john',1,22,4),('smit',2,17,1),('nick',3,43,1)]
答案 0 :(得分:0)
一行包含理解列表:
output = list(set([(x[0], x[1], x[2], sum([y[3] for y in facts if y[0]==x[0]])) for x in facts]))
[('smit',1,17,1),('nick',1,43,1),('john',1,22,4)]
这也可以用熊猫来完成,并保持“原始”顺序:
import pandas as pd
data = [('john', 1, 22, 1),('smit', 1, 17, 1),('john', 1, 22, 2),('nick', 1, 43, 1),('john', 1, 22, 1)]
df = pd.DataFrame(data)
df = df.groupby(by=[0,1,2]).agg({3: 'sum'}).reset_index()
output = [tuple(l) for l in df.values.tolist()]
print(output)
[('john',1,22,4),('nick',1,43,1),('smit',1,17,1)]