我有一个dicts列表如下:
[{'ppm_error': -5.441115144810845e-07, 'key': 'Y7', 'obs_ion': 1054.5045550349998},
{'ppm_error': 2.3119997582222951e-07, 'key': 'Y9', 'obs_ion': 1047.547178035},
{'ppm_error': 2.3119997582222951e-07, 'key': 'Y9', 'obs_ion': 1381.24928035},
{'ppm_error': -2.5532659838679713e-06, 'key': 'Y4', 'obs_ion': 741.339467035},
{'ppm_error': 1.3036219678359603e-05, 'key': 'Y10', 'obs_ion': 1349.712302035},
{'ppm_error': 3.4259216556970878e-06, 'key': 'Y6', 'obs_ion': 941.424286035},
{'ppm_error': 1.1292770047090912e-06, 'key': 'Y2', 'obs_ion': 261.156025035},
{'ppm_error': 1.1292770047090912e-06, 'key': 'Y2', 'obs_ion': 389.156424565},
{'ppm_error': 9.326980606898406e-06, 'key': 'Y5', 'obs_ion': 667.3107950350001}
]
我想删除带有重复键的dicts,这样只会使用唯一的键'依然存在。在最终列表中哪个dict结束并不重要。因此,最终列表应如下所示:
[{'ppm_error': -5.441115144810845e-07, 'key': 'Y7', 'obs_ion': 1054.5045550349998},
{'ppm_error': 2.3119997582222951e-07, 'key': 'Y9', 'obs_ion': 1381.24928035},
{'ppm_error': -2.5532659838679713e-06, 'key': 'Y4', 'obs_ion': 741.339467035},
{'ppm_error': 1.3036219678359603e-05, 'key': 'Y10', 'obs_ion': 1349.712302035},
{'ppm_error': 3.4259216556970878e-06, 'key': 'Y6', 'obs_ion': 941.424286035},
{'ppm_error': 1.1292770047090912e-06, 'key': 'Y2', 'obs_ion': 261.156025035},
{'ppm_error': 9.326980606898406e-06, 'key': 'Y5', 'obs_ion': 667.3107950350001}
]
是否可以使用itertools.groupby函数来执行此操作,还是有其他方法可以解决此问题?有什么建议吗?
答案 0 :(得分:6)
如果订单很重要,那么您可以使用collections.OrderedDict
收集所有项目,例如
from collections import OrderedDict
print OrderedDict((d["key"], d) for d in my_list).values()
如果订单无关紧要,您可以使用普通字典,例如
print {d["key"]:d for d in my_list}.values()
答案 1 :(得分:2)
另一个解决方案是记住已处理的密钥,如果已经看到密钥则返回不同的结果。这可以使用memoization来完成:
def get_key_watcher():
keys_seen = set()
def key_not_seen(d):
key = d['key']
if key in keys_seen:
return False # key is not new
else:
keys_seen.add(key)
return True # key seen for the first time
return key_not_seen
然后你可以像这样使用它:
>>> filtered_dicts = filter(get_key_watcher(), dicts)
>>> filtered_dicts
[{'ppm_error': -5.441115144810845e-07, 'obs_ion': 1054.5045550349998, 'key': 'Y7'},
{'ppm_error': 2.3119997582222951e-07, 'obs_ion': 1047.547178035, 'key': 'Y9'},
{'ppm_error': -2.5532659838679713e-06, 'obs_ion': 741.339467035, 'key': 'Y4'},
{'ppm_error': 1.3036219678359603e-05, 'obs_ion': 1349.712302035, 'key': 'Y10'},
{'ppm_error': 3.4259216556970878e-06, 'obs_ion': 941.424286035, 'key': 'Y6'},
{'ppm_error': 1.1292770047090912e-06, 'obs_ion': 261.156025035, 'key': 'Y2'},
{'ppm_error': 9.326980606898406e-06, 'obs_ion': 667.3107950350001, 'key': 'Y5'}]
显然,它维护着词典的顺序。并保持首先遇到字典。
答案 2 :(得分:0)
我会这样做:
list = [...] # your list
finallist = dict(map(lambda x: (x['key'],x), list)).values()
基本上,@ thefourtheye在他的回答中提供了同样的解决方案......
答案 3 :(得分:0)
将其转换为numpy数组
a = numpy.array([(d["ppm_error"],d["key"],d["obs_ion"]) for d in my_dicts])
mask =numpy.unique(a[:,1],True)[1]
uniques = a[mask]
然后回到dict
unique_entries = map(dict,[zip(labels,row) for row in uniques])