类似于分组依据的操作

时间:2019-07-04 17:55:41

标签: python list

我有ID和分数的列表:

ids=[1,2,1,1,3,1]
scores=[10,20,10,30,40,10]

我想从列表ID中删除重复项,以便对分数进行相应的汇总。这与使用数据框时groupby.sum()所做的非常相似。

所以,作为输出,我期望:

ids=[1,2,3]
scores=[60,20,40]

我使用以下代码,但并非在所有情况下都适用:

for indi ,i in enumerate(ids):
     for indj ,j in enumerate(ids):
           if(i==j) and (indi!=indj):
                  del ids[i]
                  scores[indj]=scores[indi]+scores[indj]
                  del scores[indi]

6 个答案:

答案 0 :(得分:1)

您可以使用idsscores创建字典,其中键作为id的元素,而值作为与id中的元素相对应的元素的列表,他们可以对这些值求和,并获得新的idscores列表

from collections import defaultdict

ids=[1,2,1,1,3,1]
scores=[10,20,10,30,40,10]

dct = defaultdict(list)

#Create the dictionary of element of ids vs list of elements of scores
for id, score in zip(ids, scores):
    dct[id].append(score)

print(dct)
#defaultdict(<class 'list'>, {1: [10, 10, 30, 10], 2: [20], 3: [40]})

#Calculate the sum of values, and get the new ids and scores list
new_ids, new_scores = zip(*((key, sum(value)) for key, value in dct.items()))

print(list(new_ids))
print(list(new_scores))

输出将为

[1, 2, 3]
[60, 20, 40]

答案 1 :(得分:0)

如评论中所建议,使用字典是一种方法。您可以遍历列表一次,并更新每个ID的总和。

如果要在末尾列出两个列表,请从字典中选择keysvalues方法中的keys()values()

ids=[1,2,1,1,3,1]
scores=[10,20,10,30,40,10]

# Init the idct with all ids at 0
dict_ = {i:0 for i in set(ids)}
for id, scores in zip(ids, scores):
    dict_[id] += scores

print(dict_)
# {1: 60, 2: 20, 3: 40}

new_ids = list(dict_.keys())
sum_score = list(dict_.values())
print(new_ids)
# [1, 2, 3]
print(sum_score)
# [60, 20, 40]

答案 2 :(得分:0)

只需遍历它们,然后添加ID是否匹配。

ids=[1,2,1,1,3,1]
scores=[10,20,10,30,40,10]
ans={}
for i,s in zip(ids,scores):
    if i in ans:
        ans[i]+=s
    else:
        ans[i]=s
ids, scores=list(ans.keys()), list(ans.values())

输出:

[1, 2, 3]
[60, 20, 40]

答案 3 :(得分:0)

# Find all unique ids and keep track of their scores
id_to_score = {id : 0 for id in set(ids)}

# Sum up the scores for that id
for index, id in enumerate(ids):
    id_to_score[id] += scores[index]

unique_ids = []
score_sum = []
for (i, s) in id_to_score.items():
    unique_ids.append(i)
    score_sum.append(s)

print(unique_ids) # [1, 2, 3]
print(score_sum)  # [60, 20, 40]

答案 4 :(得分:0)

这可能会对您有所帮助。

#  Solution 1
import pandas as pd

ids=[1,2,1,1,3,1]
scores=[10,20,10,30,40,10]

df = pd.DataFrame(list(zip(ids, scores)),
                  columns=['ids', 'scores'])


print(df.groupby('ids').sum())

#### Output  ####

     scores
ids        
1        60
2        20
3        40


#  Solution 2
from itertools import groupby
zipped_list  = list(zip(ids, scores))
print([[k, sum(v for _, v in g)] for k, g in groupby(sorted(zipped_list), key = lambda x: x[0])])

#### Output  ####

[[1, 60], [2, 20], [3, 40]]

答案 5 :(得分:0)

仅使用内置的Python工具,我将按照以下方式执行该任务:

ids=[1,2,1,1,3,1]
scores=[10,20,10,30,40,10]
uids = list(set(ids)) # unique ids
for uid in uids:
    print(uid,sum(s for inx,s in enumerate(scores) if ids[inx]==uid))

输出:

1 60
2 20
3 40

上面的代码只是print的结果,但是很容易将其更改为dict的结果:

output_dict = {uid:sum(s for inx,s in enumerate(scores) if ids[inx]==uid) for uid in uids} # {1: 60, 2: 20, 3: 40}

或其他数据结构。请记住,此方法需要为每个唯一ID单独传递,因此它可能比其他方法慢。无论是什么问题,都取决于您的数据量。