我通过以下方式列出了词典:
list1 = [{'some_id': 1, 'lower_range': 3, 'upper_range': 7},
{'some_id': 2, 'lower_range': 8, 'upper_range': 12},
{'some_id': 3, 'lower_range': 13, 'upper_range': 16}]
第二个列表包含一些整数:
list2 = [{'value': 4, 'data': 'A'},
{'value': 8, 'data': 'B'},
{'value': 9, 'data': 'C'},
{'value': 15, 'data': 'D'}]
我现在想要加入'some_id'
和'data'
,以便'value'
位于'lower_range'
和'upper_range'
之间的新列表中。即,我希望输出为
list3 = [{'some_id': 1, 'data': 'A'},
{'some_id': 2, 'data': 'B'},
{'some_id': 2, 'data': 'C'},
{'some_id': 3, 'data': 'D'}]
这样做的一种方法是
list3 = []
for i in list1:
for j in list2:
if (j['value'] >= i['lower_range'] and
j['value'] <= i['upper_range']):
list3.append({'some_id': i['some_id'], 'data': j['data']})
然而,这似乎非常低效。有更快的方法吗?
答案 0 :(得分:3)
这有点冗长,但由于排序(您还可以使用O(nlogn)
就地排序),因此效率更高(O(n^2)
&lt; list.sort
}:
#!/usr/bin/env python
from operator import itemgetter
list1 = [{'some_id': 1, 'lower_range': 3, 'upper_range': 7},
{'some_id': 2, 'lower_range': 8, 'upper_range': 12},
{'some_id': 3, 'lower_range': 13, 'upper_range': 16}]
list2 = [{'value': 4, 'data': 'A'},
{'value': 8, 'data': 'B'},
{'value': 9, 'data': 'C'},
{'value': 15, 'data': 'D'}]
# sort before merging so we iterate less (O(nlogn))
list1 = sorted(list1, key=itemgetter('lower_range'))
list2 = sorted(list2, key=itemgetter('value'))
it1 = iter(list1)
it2 = iter(list2)
# merge lists that we know are sorted (simple merging algorithm - O(n))
try:
curr_range = next(it1)
curr_val = next(it2)
list3 = []
while True:
rng = range(curr_range['lower_range'], curr_range['upper_range'] + 1)
value = curr_val['value']
if value in rng:
# got a match, add it and check if there are more values
list3.append({'some_id': curr_range['some_id'],
'data': curr_val['data']})
curr_val = next(it2)
continue
if value < curr_range['lower_range']:
# no match, skip to next value
curr_val = next(it2)
continue
if value >= curr_range['upper_range']:
# range too low for value, try next one
curr_range = next(it1)
continue
except StopIteration:
pass
print(list3)
给出:
[{'data': 'A', 'some_id': 1},
{'data': 'B', 'some_id': 2},
{'data': 'C', 'some_id': 2},
{'data': 'D', 'some_id': 3}]
答案 1 :(得分:3)
有一个特殊的前提是范围不重叠。 因此,我们可以通过搜索满足条件的最大lower_bound元素来找到候选者。
二进制搜索可以降低从O(n*n)
到O(n log n)
的复杂性。
在python3中,我们可以使用bisect。
list1 = [{'some_id': 1, 'lower_range': 3, 'upper_range': 7},
{'some_id': 2, 'lower_range': 8, 'upper_range': 12},
{'some_id': 3, 'lower_range': 13, 'upper_range': 16}]
list2 = [{'value': 4, 'data': 'A'},
{'value': 8, 'data': 'B'},
{'value': 9, 'data': 'C'},
{'value': 15, 'data': 'D'}]
list3 = []
list1.sort(key = lambda r: r['lower_range'])
lower_ranges = [r['lower_range'] for r in list1]
from bisect import bisect_right
for record in list2:
position = bisect_right(lower_ranges, record['value']) - 1
if (position < 0): continue
candidate = list1[position]
if (record['value'] <= candidate['upper_range']):
list3.append({'some_id': candidate['some_id'], 'data': record['data']})
print(list3)
输出(手动缩进)
[{'some_id': 1, 'data': 'A'},
{'some_id': 2, 'data': 'B'},
{'some_id': 2, 'data': 'C'},
{'some_id': 3, 'data': 'D'}]
答案 2 :(得分:2)
您可以创建一个将值映射到{3: 1, 4: 1, 5: 1, ..., 8: 2, 9: 2, ...}
之类的ID的dict,这样可以让您在常量O(1)时间内找到每个dict的id:
# create a dict that maps values to ids
value_to_id_dict = {}
for dic in list1:
id_ = dic['some_id']
for value in range(dic['lower_range'], dic['upper_range']+1):
value_to_id_dict[value] = id_
# look up each dict's id in the dict we just created
list3 = []
for dic in list2:
new_dic = {'data': dic['data'],
'some_id': value_to_id_dict[dic['value']]}
list3.append(new_dic)
# result:
# [{'data': 'A', 'some_id': 1},
# {'data': 'B', 'some_id': 2},
# {'data': 'C', 'some_id': 2},
# {'data': 'D', 'some_id': 3}]