我有ID和分数的列表:
ids=[1,2,1,1,3,1]
scores=[10,20,10,30,40,10]
我想从列表ID中删除重复项,以便对分数进行相应的汇总。这与使用数据框时groupby.sum()所做的非常相似。
所以,作为输出,我期望:
ids=[1,2,3]
scores=[60,20,40]
我使用以下代码,但并非在所有情况下都适用:
for indi ,i in enumerate(ids):
for indj ,j in enumerate(ids):
if(i==j) and (indi!=indj):
del ids[i]
scores[indj]=scores[indi]+scores[indj]
del scores[indi]
答案 0 :(得分:1)
您可以使用ids
和scores
创建字典,其中键作为id
的元素,而值作为与id
中的元素相对应的元素的列表,他们可以对这些值求和,并获得新的id
和scores
列表
from collections import defaultdict
ids=[1,2,1,1,3,1]
scores=[10,20,10,30,40,10]
dct = defaultdict(list)
#Create the dictionary of element of ids vs list of elements of scores
for id, score in zip(ids, scores):
dct[id].append(score)
print(dct)
#defaultdict(<class 'list'>, {1: [10, 10, 30, 10], 2: [20], 3: [40]})
#Calculate the sum of values, and get the new ids and scores list
new_ids, new_scores = zip(*((key, sum(value)) for key, value in dct.items()))
print(list(new_ids))
print(list(new_scores))
输出将为
[1, 2, 3]
[60, 20, 40]
答案 1 :(得分:0)
如评论中所建议,使用字典是一种方法。您可以遍历列表一次,并更新每个ID的总和。
如果要在末尾列出两个列表,请从字典中选择keys
和values
方法中的keys()
和values()
:
ids=[1,2,1,1,3,1]
scores=[10,20,10,30,40,10]
# Init the idct with all ids at 0
dict_ = {i:0 for i in set(ids)}
for id, scores in zip(ids, scores):
dict_[id] += scores
print(dict_)
# {1: 60, 2: 20, 3: 40}
new_ids = list(dict_.keys())
sum_score = list(dict_.values())
print(new_ids)
# [1, 2, 3]
print(sum_score)
# [60, 20, 40]
答案 2 :(得分:0)
只需遍历它们,然后添加ID是否匹配。
ids=[1,2,1,1,3,1]
scores=[10,20,10,30,40,10]
ans={}
for i,s in zip(ids,scores):
if i in ans:
ans[i]+=s
else:
ans[i]=s
ids, scores=list(ans.keys()), list(ans.values())
输出:
[1, 2, 3]
[60, 20, 40]
答案 3 :(得分:0)
# Find all unique ids and keep track of their scores
id_to_score = {id : 0 for id in set(ids)}
# Sum up the scores for that id
for index, id in enumerate(ids):
id_to_score[id] += scores[index]
unique_ids = []
score_sum = []
for (i, s) in id_to_score.items():
unique_ids.append(i)
score_sum.append(s)
print(unique_ids) # [1, 2, 3]
print(score_sum) # [60, 20, 40]
答案 4 :(得分:0)
这可能会对您有所帮助。
# Solution 1
import pandas as pd
ids=[1,2,1,1,3,1]
scores=[10,20,10,30,40,10]
df = pd.DataFrame(list(zip(ids, scores)),
columns=['ids', 'scores'])
print(df.groupby('ids').sum())
#### Output ####
scores
ids
1 60
2 20
3 40
# Solution 2
from itertools import groupby
zipped_list = list(zip(ids, scores))
print([[k, sum(v for _, v in g)] for k, g in groupby(sorted(zipped_list), key = lambda x: x[0])])
#### Output ####
[[1, 60], [2, 20], [3, 40]]
答案 5 :(得分:0)
仅使用内置的Python工具,我将按照以下方式执行该任务:
ids=[1,2,1,1,3,1]
scores=[10,20,10,30,40,10]
uids = list(set(ids)) # unique ids
for uid in uids:
print(uid,sum(s for inx,s in enumerate(scores) if ids[inx]==uid))
输出:
1 60
2 20
3 40
上面的代码只是print
的结果,但是很容易将其更改为dict
的结果:
output_dict = {uid:sum(s for inx,s in enumerate(scores) if ids[inx]==uid) for uid in uids} # {1: 60, 2: 20, 3: 40}
或其他数据结构。请记住,此方法需要为每个唯一ID单独传递,因此它可能比其他方法慢。无论是什么问题,都取决于您的数据量。