Question

我有ID和分数的列表：

ids=[1,2,1,1,3,1]
scores=[10,20,10,30,40,10]

我想从列表ID中删除重复项，以便对分数进行相应的汇总。这与使用数据框时groupby.sum（）所做的非常相似。

所以，作为输出，我期望：

ids=[1,2,3]
scores=[60,20,40]

我使用以下代码，但并非在所有情况下都适用：

for indi ,i in enumerate(ids):
     for indj ,j in enumerate(ids):
           if(i==j) and (indi!=indj):
                  del ids[i]
                  scores[indj]=scores[indi]+scores[indj]
                  del scores[indi]

Answer 1

您可以使用ids和scores创建字典，其中键作为id的元素，而值作为与id中的元素相对应的元素的列表，他们可以对这些值求和，并获得新的id和scores列表

from collections import defaultdict

ids=[1,2,1,1,3,1]
scores=[10,20,10,30,40,10]

dct = defaultdict(list)

#Create the dictionary of element of ids vs list of elements of scores
for id, score in zip(ids, scores):
    dct[id].append(score)

print(dct)
#defaultdict(<class 'list'>, {1: [10, 10, 30, 10], 2: [20], 3: [40]})

#Calculate the sum of values, and get the new ids and scores list
new_ids, new_scores = zip(*((key, sum(value)) for key, value in dct.items()))

print(list(new_ids))
print(list(new_scores))

输出将为

[1, 2, 3]
[60, 20, 40]

Answer 2

如评论中所建议，使用字典是一种方法。您可以遍历列表一次，并更新每个ID的总和。

如果要在末尾列出两个列表，请从字典中选择keys和values方法中的keys()和values()：

ids=[1,2,1,1,3,1]
scores=[10,20,10,30,40,10]

# Init the idct with all ids at 0
dict_ = {i:0 for i in set(ids)}
for id, scores in zip(ids, scores):
    dict_[id] += scores

print(dict_)
# {1: 60, 2: 20, 3: 40}

new_ids = list(dict_.keys())
sum_score = list(dict_.values())
print(new_ids)
# [1, 2, 3]
print(sum_score)
# [60, 20, 40]

Answer 3

只需遍历它们，然后添加ID是否匹配。

ids=[1,2,1,1,3,1]
scores=[10,20,10,30,40,10]
ans={}
for i,s in zip(ids,scores):
    if i in ans:
        ans[i]+=s
    else:
        ans[i]=s
ids, scores=list(ans.keys()), list(ans.values())

输出：

[1, 2, 3]
[60, 20, 40]

Answer 4

# Find all unique ids and keep track of their scores
id_to_score = {id : 0 for id in set(ids)}

# Sum up the scores for that id
for index, id in enumerate(ids):
    id_to_score[id] += scores[index]

unique_ids = []
score_sum = []
for (i, s) in id_to_score.items():
    unique_ids.append(i)
    score_sum.append(s)

print(unique_ids) # [1, 2, 3]
print(score_sum)  # [60, 20, 40]

Answer 5

这可能会对您有所帮助。

#  Solution 1
import pandas as pd

ids=[1,2,1,1,3,1]
scores=[10,20,10,30,40,10]

df = pd.DataFrame(list(zip(ids, scores)),
                  columns=['ids', 'scores'])


print(df.groupby('ids').sum())

#### Output  ####

     scores
ids        
1        60
2        20
3        40


#  Solution 2
from itertools import groupby
zipped_list  = list(zip(ids, scores))
print([[k, sum(v for _, v in g)] for k, g in groupby(sorted(zipped_list), key = lambda x: x[0])])

#### Output  ####

[[1, 60], [2, 20], [3, 40]]

Answer 6

仅使用内置的Python工具，我将按照以下方式执行该任务：

ids=[1,2,1,1,3,1]
scores=[10,20,10,30,40,10]
uids = list(set(ids)) # unique ids
for uid in uids:
    print(uid,sum(s for inx,s in enumerate(scores) if ids[inx]==uid))

输出：

1 60
2 20
3 40

上面的代码只是print的结果，但是很容易将其更改为dict的结果：

output_dict = {uid:sum(s for inx,s in enumerate(scores) if ids[inx]==uid) for uid in uids} # {1: 60, 2: 20, 3: 40}

或其他数据结构。请记住，此方法需要为每个唯一ID单独传递，因此它可能比其他方法慢。无论是什么问题，都取决于您的数据量。

类似于分组依据的操作

6 个答案: