您好,感谢您的帮助。我有一个看起来像这样的字典列表:
list_balls = [{'id': '803371', 'is_used': False, 'source': 'store', 'air': 0.9},
{'id': '803371', 'is_used': False, 'source': 'donation', 'air': 0.20},
{'id': '30042', 'is_used': False, 'source': 'donation', 'air': 0.75},
{'id': '803371', 'is_used': False, 'source': 'store', 'air': 1}]
我需要清除此列表,以保留词典的唯一列表。如果有两个或两个以上具有相同ID的条目,我需要选择一个空中值最高的条目。如果它们的air和id具有相等的值,我需要将其保留为source =='store'。因此,这种情况下的结果将是
list_balls = [{'id': '30042', 'is_used': False, 'source': 'donation', 'air': 0.75},
{'id': '803371', 'is_used': False, 'source': 'store', 'air': 1}]
我尝试使用以下代码将需要删除的代码标记为keep = False,但仅在有两个重复项时才起作用:
for i in range(0, len(list_balls )):
if len(list_balls ) > 1:
#print(list_balls [i])
for j in range(1, len(list_balls )):
if (list_balls [i]['id'] == list_balls [j]['id']):
if (list_balls [i]['air'] > list_balls [j]['air']):
list_balls [i]['keep'] = True
list_balls [j]['keep'] = False
print(list_pns)
我认为此double for循环也不是执行此操作的最有效方法,因此欢迎其他任何想法。谢谢您的帮助
答案 0 :(得分:1)
使用itertools.groupby
例如:
from itertools import groupby
list_balls = [{'source': 'store', 'air': 0.9, 'id': '803371', 'is_used': False}, {'source': 'donation', 'air': 0.2, 'id': '803371', 'is_used': False}, {'source': 'donation', 'air': 0.75, 'id': '30042', 'is_used': False}, {'source': 'store', 'air': 1, 'id': '803371', 'is_used': False}]
#result = [max(list(v), key=lambda x: x["air"]) for k, v in groupby(sorted(list_balls, key=lambda x: x["id"]), lambda x: x["id"])]
result = [max(list(v), key=lambda x: (x["air"], x["source"] == "store")) for k, v in groupby(sorted(list_balls, key=lambda x: x["id"]), lambda x: x["id"])]
print(result)
输出:
[{'air': 0.75, 'id': '30042', 'is_used': False, 'source': 'donation'},
{'air': 1, 'id': '803371', 'is_used': False, 'source': 'store'}]
答案 1 :(得分:1)
只需这样:
list_balls = [{'id': '803371', 'is_used': False, 'source': 'store', 'air': 0.9},
{'id': '803371', 'is_used': False, 'source': 'donation', 'air': 0.20},
{'id': '30042', 'is_used': False, 'source': 'donation', 'air': 0.75},
{'id': '803371', 'is_used': False, 'source': 'store', 'air': 1}]
result = {}
for e in list_balls:
if e['id'] not in result or (
(e['air'], e['source'] == 'store') >
(result[e['id']]['air'], result[e['id']]['source'] =='store')
):
result[e['id']] = e
result_list = list(result.values())
print(result_list)
显示
[{'id': '803371', 'is_used': False, 'source': 'store', 'air': 1}, {'id': '30042', 'is_used': False, 'source': 'donation', 'air': 0.75}]
您可以直接比较元组以在多个条件下进行比较。请注意,True始终> False(1> 0)
与groupby和defaultdict解决方案相比,执行速度更快:
import random
from collections import defauldict
from itertools import groupby
list_balls = []
for _ in range(10000000):
list_balls.append(
{
'source': random.choice(['store', 'donation']),
'id': random.randint(0,10000),
'air': random.randint(0,4)
}
)
def vanilla_filter_list(list_balls):
result = {}
for e in list_balls:
if e['id'] not in result or (
(e['air'], e['source'] == 'store') >
(result[e['id']]['air'], result[e['id']]['source'] =='store')
):
result[e['id']] = e
return list(result.values())
def groupby_filter_list(list_balls):
return [max(list(v),
key=lambda x: (x["air"], x["source"] == "store")) for k, v in groupby(
sorted(list_balls, key=lambda x: x["id"]),
lambda x: x["id"])]
def collections_filter_list(list_balls):
d = defaultdict(list)
for ball in list_balls:
d[ball["id"]].append(ball)
return [
max(group, key=lambda x: (x["air"], x["source"] == "store")) for group in d.values()
]
%%time
vanilla_filter_list(list_balls) # 5.52s
%%time
groupby_filter_list(list_balls) #14.3s
%%time
collections_filter_list(list_balls) #8.41s
答案 2 :(得分:0)
尝试一下:
all_id = set(i['id'] for i in list_balls)
new_list_ballls = []
for id_ in all_id:
max_air = max(i['air'] for i in list_balls if i['id']==id_)
max_air_count = sum(1 for i in list_balls if i['air']==max_air and i['id']==id_)
if max_air_count==1:
for i in list_balls:
if i['id']==id_ and i['air']==max_air:
new_list_ballls.append(i)
else:
for i in list_balls:
if i['id']==id_ and i['air']==max_air and i['source'] != 'store':
new_list_ballls.append(i)
输出:
[{'id': '30042', 'is_used': False, 'source': 'donation', 'air': 0.75},
{'id': '803371', 'is_used': False, 'source': 'store', 'air': 1}]
答案 3 :(得分:0)
这里
from collections import defaultdict
list_balls = [{'id': '803371', 'is_used': False, 'source': 'store', 'air': 0.9},
{'id': '803371', 'is_used': False, 'source': 'donation', 'air': 0.20},
{'id': '30042', 'is_used': False, 'source': 'donation', 'air': 0.75},
{'id': '803371', 'is_used': False, 'source': 'store', 'air': 1}]
grouped_data = defaultdict(list)
for entry in list_balls:
grouped_data[entry['id']].append(entry)
final_list = []
for k, v in grouped_data.items():
if len(v) == 1:
final_list.append(v[0])
else:
# sort by air
x = sorted(v, key=lambda k1: k1['air'], reverse=True)
if x[0]['air'] != x[1]['air']:
final_list.append(x[0])
else:
# decide by source
if [x[0]]['source'] == 'store':
final_list.append(x[0])
elif [x[1]]['source'] == 'store':
final_list.append(x[1])
for entry in final_list:
print(entry)
输出
{'id': '803371', 'is_used': False, 'source': 'store', 'air': 1}
{'id': '30042', 'is_used': False, 'source': 'donation', 'air': 0.75}
答案 4 :(得分:0)
我首先将id
与defaultdict分组,然后再由air
得到最大字典。如果air
和id
之间出现平局,则将source
用作max()
的辅助key
。
演示:
from collections import defaultdict
list_balls = [
{"id": "803371", "is_used": False, "source": "store", "air": 0.9},
{"id": "803371", "is_used": False, "source": "donation", "air": 0.20},
{"id": "30042", "is_used": False, "source": "donation", "air": 0.75},
{"id": "803371", "is_used": False, "source": "store", "air": 1},
{"id": "803371", "is_used": False, "source": "donation", "air": 1},
]
d = defaultdict(list)
for ball in list_balls:
d[ball["id"]].append(ball)
result = [
max(group, key=lambda x: (x["air"], x["source"] == "store")) for group in d.values()
]
print(result)
输出:
[{'id': '803371', 'is_used': False, 'source': 'store', 'air': 1}, {'id': '30042', 'is_used': False, 'source': 'donation', 'air': 0.75}]
答案 5 :(得分:0)
没什么,几乎只有一个纯Python。
按id
对字典列表进行排序,然后按air
的负值排序,以使最大的字典排在最前面,然后按source
排序,以使带有store
的条目排在最前面。之后,从每组字典中选择第一个条目,这些字典按id
分组。
import pprint
list_balls = [
{'id': '803371', 'is_used': False, 'source': 'store', 'air': 0.9},
{'id': '803371', 'is_used': False, 'source': 'donation', 'air': 0.20},
{'id': '30042', 'is_used': False, 'source': 'donation', 'air': 0.75},
{'id': '803371', 'is_used': False, 'source': 'store', 'air': 1}
]
list_balls.sort(key=lambda k: (k['id'], -k['air'], 0 if k['source'] == 'store' else 1))
pprint.pprint([d for i, d in enumerate(list_balls) if i == 0 or list_balls[i - 1]['id'] != d['id']])
输出:
[{'air': 0.75, 'id': '30042', 'is_used': False, 'source': 'donation'},
{'air': 1, 'id': '803371', 'is_used': False, 'source': 'store'}]