我需要两个字典并过滤掉垃圾'无法识别名称的项目:
data = [
{'annotation_id': 22, 'record_id': 5, 'name': 'Joe Young'},
{'annotation_id': 13, 'record_id': 7, 'name': '----'},
{'annotation_id': 12, 'record_id': 9, 'name': 'Greg Band'},
]
garbage = [
{'annotation_id': 13, 'record_id': 7, 'name': '----'}
]
所以在这种情况下我需要从数据中删除annotation_id 13.
我尝试迭代列表并删除它但我明白在python中不能很好地工作。我也尝试了列表理解,但也失败了。我做错了什么?我的代码如下:
data = [[item for item in data if item['name'] != g['name'] for g in garbage]
上面的代码创建了许多重复版本的dicts。
答案 0 :(得分:3)
删除dicts数组中特定条目的简单而优雅的方法,其中garbage
是要从data
中删除的dicts条目列表:
for g in garbage:
if g in data:
data.remove(g)
输入数据:
data = [
{'annotation_id': 22, 'record_id': 5, 'name': 'Joe Young'},
{'annotation_id': 13, 'record_id': 7, 'name': '----'},
{'annotation_id': 12, 'record_id': 9, 'name': 'Greg Band'},
]
garbage = [
{'annotation_id': 13, 'record_id': 7, 'name': '----'}
]
<强>结果:强>
data = [
{'record_id': 5, 'annotation_id': 22, 'name': 'Joe Young'},
{'record_id': 9, 'annotation_id': 12, 'name': 'Greg Band'}
]
答案 1 :(得分:1)
您可以创建一个集来保存垃圾名称,然后根据此名称集过滤数据(如果 name 是您需要过滤的标准):< / p>
garbage_names = {d['name'] for d in garbage}
[item for item in data if item['name'] not in garbage_names]
#[{'annotation_id': 22, 'name': 'Joe Young', 'record_id': 5},
# {'annotation_id': 12, 'name': 'Greg Band', 'record_id': 9}]
正如评论中所指出的那样,您也可以按照原始方法执行[item for item in data if all(item['name'] != g['name'] for g in garbage)]
,但由于双循环具有O(M * N)的时间复杂度,因此效率会略低一些一组将时间复杂度降低到O(M + N),这里有一些天真的时间:
%timeit [item for item in data if all(item['name'] != g['name'] for g in garbage)]
# 1000000 loops, best of 3: 1.68 µs per loop
%%timeit
garbage_names = {d['name'] for d in garbage}
[item for item in data if item['name'] not in garbage_names]
# 1000000 loops, best of 3: 608 ns per loop
答案 2 :(得分:1)
一个简单的filter
怎么样?
filter(lambda x: x not in garbage, data)
[{'annotation_id': 22, 'name': 'Joe Young', 'record_id': 5},
{'annotation_id': 12, 'name': 'Greg Band', 'record_id': 9}]