我有一个这样的列表:
[{'score': '92', 'imageId': '6184de26-e11d-4a7e-9c44-a1af8012d8d0', 'label': 'Door'},
{'score': '61', 'imageId': '6184de26-e11d-4a7e-9c44-a1af8012d8d0', 'label': 'Sliding Door'}]
我想根据重复图像的imageId删除重复图像。因此,在上面的示例中,imageID 6184de26-e11d-4a7e-9c44-a1af8012d8d0出现了2次(保持得分最高)。
如何在Python中做到这一点?
答案 0 :(得分:0)
我假设您想在此处保留得分最高的条目。试试这个:
my_list = [
{'score': '92', 'imageId': '6184de26-e11d-4a7e-9c44-a1af8012d8d0', 'label': 'Door'},
{'score': '61', 'imageId': '6184de26-e11d-4a7e-9c44-a1af8012d8d0', 'label': 'Sliding Door'}
]
by_id = {}
for element in my_list:
imageId = element['imageId']
if imageId in by_id:
if int(by_id[imageId]['score']) < int(element['score']):
# Replace because of higher score
by_id[imageId] = element
else:
# Insert new element
by_id[imageId] = element
print(list(by_id.values()))
答案 1 :(得分:0)
使用groupby
,
from itertools import groupby
new_list = [max(list(l),key=lambda x:x['score']) for _,l in groupby(sorted(lst,key=lambda x:x['imageId']),lambda x:x['imageId'])]
执行:
In [41]: lst = [{'score': '92', 'imageId': '6184de26-e11d-4a7e-9c44-a1af8012d8d0', 'label': 'Door'}, {'score': '61', 'imageId': '6184de26-e11d-4a7e-9c44-a1af8012d8d0', 'label': 'Sliding Door'}]
In [42]: print [max(list(l),key=lambda x:x['score']) for g,l in groupby(lst,lambda x:x['imageId'])]
[{'score': '92', 'label': 'Door', 'imageId': '6184de26-e11d-4a7e-9c44-a1af8012d8d0'}]
答案 2 :(得分:0)
我建议您对示例进行一些改进,以便:
我将创建一个标记dict,其ID为键,子列表为值。如果值较大(请不要忘记将其强制转换为整数),则循环输入并覆盖dict条目
my_list = [
{'score': '192', 'imageId': '6184de26-e11d-4a7e-9c44-a1af8012d8d0', 'label': 'Door'},
{'score': '61', 'imageId': 'fffffe26-e11d-4a7e-9c44-a1af8012d8d0', 'label': 'misc'},
{'score': '761', 'imageId': '6184de26-e11d-4a7e-9c44-a1af8012d8d0', 'label': 'Sliding Door'},
{'score': '45', 'imageId': '6184de26-e11d-4a7e-9c44-a1af8012d8d0', 'label': 'Door'},
]
import collections
d = dict()
for subdict in my_list:
score = int(subdict['score'])
image_id = subdict['imageId']
if image_id not in d or int(d[image_id]['score']) < score:
d[image_id] = subdict
new_list = list(d.values())
结果(当我们使用字典时顺序可能会改变):
[{'imageId': 'fffffe26-e11d-4a7e-9c44-a1af8012d8d0',
'label': 'misc',
'score': '61'},
{'imageId': '6184de26-e11d-4a7e-9c44-a1af8012d8d0',
'label': 'Sliding Door',
'score': '761'}]
答案 3 :(得分:0)
如果您有大量数据,只需使用pandas.DataFrame(它的清理程序即可读取和维护)进行处理。
import pandas as pd
my_list = [
{'score': '192', 'imageId': '6184de26-e11d-4a7e-9c44-a1af8012d8d0', 'label': 'Door'},
{'score': '61', 'imageId': 'fffffe26-e11d-4a7e-9c44-a1af8012d8d0', 'label': 'misc'},
{'score': '761', 'imageId': '6184de26-e11d-4a7e-9c44-a1af8012d8d0', 'label': 'Sliding Door'},
{'score': '45', 'imageId': '6184de26-e11d-4a7e-9c44-a1af8012d8d0', 'label': 'Door'},
]
# create dataframe
df = pd.DataFrame(my_list)
# your score is string! convert it to int
df['score'] = df['score'].astype('int')
# sort values
df = df.sort_values(by=['imageId', 'score'], ascending=False)
# drop duplicates
df = df.drop_duplicates('imageId', keep='first')
imageId label score
1 fffffe26-e11d-4a7e-9c44-a1af8012d8d0 misc 61
2 6184de26-e11d-4a7e-9c44-a1af8012d8d0 Sliding Door 761