删除特定键的重复值并添加其他键值Python

时间:2017-05-15 10:50:46

标签: python arrays duplicates

我有一个类似

的数组
[{'activityCount': 0, 'jobCount': 0, 'oId': u'57e229cc8741833c738b4567'},
 {'activityCount': 0, 'jobCount': 0, 'oId': u'55a646a1874183dc018b4567'},
 {'activityCount': 0, 'jobCount': 0, 'oId': u'55a646a1874183dc018b4567'},
 {'activityCount': 0, 'jobCount': 0, 'oId': u'58660bc587418325258b4567'},
 {'activityCount': 0, 'jobCount': 0, 'oId': u'55a646a1874183dc018b4567'},
 {'activityCount': 0, 'jobCount': 0, 'oId': u'55a646a1874183dc018b4567'},
 {'activityCount': 0, 'jobCount': 0, 'oId': u'5783a71a874183e3158b4568'},
 {'activityCount': 0, 'jobCount': 0, 'oId': u'5783a71a874183e3158b4568'},
 {'activityCount': 1, 'jobCount': 0, 'oId': u'58650ad5874183df748b4567'},
 {'activityCount': 0, 'jobCount': 0, 'oId': u'55a646a1874183dc018b4567'},
 {'activityCount': 0, 'jobCount': 0, 'oId': u'55a646a1874183dc018b4567'},
 {'activityCount': 0, 'jobCount': 0, 'oId': u'57dccedc87418359718b4567'},
 {'activityCount': 0, 'jobCount': 0, 'oId': u'57e229cc8741833c738b4567'},
 {'activityCount': 0, 'jobCount': 1, 'oId': u'55a646a1874183dc018b4567'},
 {'activityCount': 0, 'jobCount': 0, 'oId': u'55a646a1874183dc018b4567'},
 {'activityCount': 0, 'jobCount': 0, 'oId': u'55a646a1874183dc018b4567'},
 {'activityCount': 0, 'jobCount': 0, 'oId': u'55a646a1874183dc018b4567'},
 {'activityCount': 0, 'jobCount': 0, 'oId': u'55a646a1874183dc018b4567'}]

我想删除oId的重复值,并将activityCountjobCount的其他值添加到单个值。 喜欢这个

 {'activityCount': 1, 'jobCount': 11, 'oId': u'57e229cc8741833c738b4567'},
 {'activityCount': 2, 'jobCount': 10, 'oId': u'55a646a1874183dc018b4567'},
 {'activityCount': 7, 'jobCount': 4, 'oId': u'57dccedc87418359718b4567'}]

添加与所有重复值相关的其他键值并存储在一个字段中。

编辑:我知道如何删除重复但我不知道如何添加与此相关的其他值

4 个答案:

答案 0 :(得分:1)

您可以遍历列表并将其转换为字典,其中oId用作密钥,并且可以存储累积数字,如:

tmp = {}

for row in d:
    if row['oId'] in tmp.keys():
        tmp[row['oId']]['activityCount'] += row['activityCount']
        tmp[row['oId']]['jobCount'] += row['jobCount']
    else:
        tmp[row['oId']] = {'activityCount': row['activityCount'], 'jobCount': row['jobCount']}

您也可以使用此词典。或者,如果需要,将其转换回列表。

答案 1 :(得分:1)

你可以试试这个!

  • 针对每个输入,检查您的ID是否已存在于新列表中
  • 如果不存在则附加
  • 否则总结所需的列!

即,

d=[{'activityCount': 0, 'jobCount': 0, 'oId': u'57e229cc8741833c738b4567'},
 {'activityCount': 0, 'jobCount': 0, 'oId': u'55a646a1874183dc018b4567'},
 {'activityCount': 0, 'jobCount': 0, 'oId': u'55a646a1874183dc018b4567'},
 {'activityCount': 0, 'jobCount': 0, 'oId': u'58660bc587418325258b4567'},
 {'activityCount': 0, 'jobCount': 0, 'oId': u'55a646a1874183dc018b4567'},
 {'activityCount': 0, 'jobCount': 0, 'oId': u'55a646a1874183dc018b4567'},
 {'activityCount': 0, 'jobCount': 0, 'oId': u'5783a71a874183e3158b4568'},
 {'activityCount': 0, 'jobCount': 0, 'oId': u'5783a71a874183e3158b4568'},
 {'activityCount': 1, 'jobCount': 0, 'oId': u'58650ad5874183df748b4567'},
 {'activityCount': 0, 'jobCount': 0, 'oId': u'55a646a1874183dc018b4567'},
 {'activityCount': 0, 'jobCount': 0, 'oId': u'55a646a1874183dc018b4567'},
 {'activityCount': 0, 'jobCount': 0, 'oId': u'57dccedc87418359718b4567'},
 {'activityCount': 0, 'jobCount': 0, 'oId': u'57e229cc8741833c738b4567'},
 {'activityCount': 1, 'jobCount': 1, 'oId': u'55a646a1874183dc018b4567'},
 {'activityCount': 0, 'jobCount': 0, 'oId': u'55a646a1874183dc018b4567'},
 {'activityCount': 0, 'jobCount': 1, 'oId': u'55a646a1874183dc018b4567'},
 {'activityCount': 13, 'jobCount': 11, 'oId': u'55a646a1874183dc018b4567'},
 {'activityCount': 0, 'jobCount': 0, 'oId': u'55a646a1874183dc018b4567'},
 {'activityCount': 0, 'jobCount': 0, 'oId': u'55a646a1874183dc018b4567'}]
final=[]
uniq=[]
for each in d:
    if each['oId'] not in uniq:
        uniq.append(each['oId'])
        final.append(each)
    else:
        for data in final:
            if data['oId']==each['oId']:
                data['activityCount']+=each['activityCount']
                data['jobCount']+=each['jobCount']
                break
print final

输出:

[{'activityCount': 0, 'oId': u'57e229cc8741833c738b4567', 'jobCount': 0}, {'activityCount': 14, 'oId': u'55a646a1874183dc018b4567', 'jobCount': 13}, {'activityCount': 0, 'oId': u'58660bc587418325258b4567', 'jobCount': 0}, {'activityCount': 0, 'oId': u'5783a71a874183e3158b4568', 'jobCount': 0}, {'activityCount': 1, 'oId': u'58650ad5874183df748b4567', 'jobCount': 0}, {'activityCount': 0, 'oId': u'57dccedc87418359718b4567', 'jobCount': 0}]

答案 2 :(得分:1)

我就是这样做的:

from collections import Counter

activityCount = Counter()
jobCount = Counter()
for record in array:
    activityCount[record['oId']] += record['activityCount']
    jobCount[record['oId']] += record['jobCount']

new_array = []
for key in activityCount.keys():
    ac = activityCount[key]
    jc = jobCount[key]
    new_array.append({
        'oId': key,
        'activityCount': ac,
        'jobCount': jc,
    })

答案 3 :(得分:1)

尝试list comprehension + groupby

from itertools import groupby
result = [{'activityCount':sum([i['activityCount'] for i in grp]),\
          'jobCount':sum([i['jobCount'] for i in grp]),'oId':name}\
          for name,grp in groupby(sorted(d,key = lambda x:x['oId']),\
          key = lambda x:x['oId'])] 

结果

[{'activityCount': 0, 'jobCount': 0, 'oId': u'55a646a1874183dc018b4567'},
 {'activityCount': 0, 'jobCount': 0, 'oId': u'5783a71a874183e3158b4568'},
 {'activityCount': 0, 'jobCount': 0, 'oId': u'57dccedc87418359718b4567'},
 {'activityCount': 0, 'jobCount': 0, 'oId': u'57e229cc8741833c738b4567'},
 {'activityCount': 1, 'jobCount': 0, 'oId': u'58650ad5874183df748b4567'},
 {'activityCount': 0, 'jobCount': 0, 'oId': u'58660bc587418325258b4567'}]