快速检查列表中的数字是否在给定范围内

时间:2018-05-19 09:18:47

标签: python python-3.x

我通过以下方式列出了词典:

list1 = [{'some_id': 1, 'lower_range': 3, 'upper_range': 7},
         {'some_id': 2, 'lower_range': 8, 'upper_range': 12},
         {'some_id': 3, 'lower_range': 13, 'upper_range': 16}]

第二个列表包含一些整数:

list2 = [{'value': 4, 'data': 'A'},
         {'value': 8, 'data': 'B'},
         {'value': 9, 'data': 'C'},
         {'value': 15, 'data': 'D'}]

我现在想要加入'some_id''data',以便'value'位于'lower_range''upper_range'之间的新列表中。即,我希望输出为

list3 = [{'some_id': 1, 'data': 'A'},
         {'some_id': 2, 'data': 'B'},
         {'some_id': 2, 'data': 'C'},
         {'some_id': 3, 'data': 'D'}]

这样做的一种方法是

list3 = []
for i in list1:
    for j in list2:
        if (j['value'] >= i['lower_range'] and
            j['value'] <= i['upper_range']):
            list3.append({'some_id': i['some_id'], 'data': j['data']})

然而,这似乎非常低效。有更快的方法吗?

3 个答案:

答案 0 :(得分:3)

这有点冗长,但由于排序(您还可以使用O(nlogn)就地排序),因此效率更高(O(n^2)&lt; list.sort}:

#!/usr/bin/env python
from operator import itemgetter

list1 = [{'some_id': 1, 'lower_range': 3, 'upper_range': 7},
        {'some_id': 2, 'lower_range': 8, 'upper_range': 12},
        {'some_id': 3, 'lower_range': 13, 'upper_range': 16}]

list2 = [{'value': 4, 'data': 'A'},
        {'value': 8, 'data': 'B'},
        {'value': 9, 'data': 'C'},
        {'value': 15, 'data': 'D'}]

# sort before merging so we iterate less (O(nlogn))
list1 = sorted(list1, key=itemgetter('lower_range'))
list2 = sorted(list2, key=itemgetter('value'))


it1 = iter(list1)
it2 = iter(list2)

# merge lists that we know are sorted (simple merging algorithm - O(n))
try:
    curr_range = next(it1)
    curr_val = next(it2)
    list3 = []
    while True:
        rng = range(curr_range['lower_range'], curr_range['upper_range'] + 1)
        value = curr_val['value']
        if value in rng:
            # got a match, add it and check if there are more values
            list3.append({'some_id': curr_range['some_id'],
                          'data': curr_val['data']})
            curr_val = next(it2)
            continue
        if value < curr_range['lower_range']:
            # no match, skip to next value
            curr_val = next(it2)
            continue
        if value >= curr_range['upper_range']:
            # range too low for value, try next one
            curr_range = next(it1)
            continue
except StopIteration:
    pass
print(list3)

给出:

[{'data': 'A', 'some_id': 1},
 {'data': 'B', 'some_id': 2},
 {'data': 'C', 'some_id': 2},
 {'data': 'D', 'some_id': 3}]

答案 1 :(得分:3)

有一个特殊的前提是范围不重叠。 因此,我们可以通过搜索满足条件的最大lower_bound元素来找到候选者。

二进制搜索可以降低从O(n*n)O(n log n)的复杂性。 在python3中,我们可以使用bisect。

list1 = [{'some_id': 1, 'lower_range': 3, 'upper_range': 7},
         {'some_id': 2, 'lower_range': 8, 'upper_range': 12},
         {'some_id': 3, 'lower_range': 13, 'upper_range': 16}]

list2 = [{'value': 4, 'data': 'A'},
         {'value': 8, 'data': 'B'},
         {'value': 9, 'data': 'C'},
         {'value': 15, 'data': 'D'}]

list3 = []

list1.sort(key = lambda r: r['lower_range'])
lower_ranges = [r['lower_range'] for r in list1]

from bisect import bisect_right

for record in list2:
    position = bisect_right(lower_ranges, record['value']) - 1
    if (position < 0): continue
    candidate = list1[position]
    if (record['value'] <= candidate['upper_range']):
        list3.append({'some_id': candidate['some_id'], 'data': record['data']})

print(list3)

输出(手动缩进)

[{'some_id': 1, 'data': 'A'},
 {'some_id': 2, 'data': 'B'},
 {'some_id': 2, 'data': 'C'},
 {'some_id': 3, 'data': 'D'}]

答案 2 :(得分:2)

您可以创建一个将值映射到{3: 1, 4: 1, 5: 1, ..., 8: 2, 9: 2, ...}之类的ID的dict,这样可以让您在常量O(1)时间内找到每个dict的id:

# create a dict that maps values to ids
value_to_id_dict = {}
for dic in list1:
    id_ = dic['some_id']
    for value in range(dic['lower_range'], dic['upper_range']+1):
        value_to_id_dict[value] = id_

# look up each dict's id in the dict we just created
list3 = []
for dic in list2:
    new_dic = {'data': dic['data'],
               'some_id': value_to_id_dict[dic['value']]}
    list3.append(new_dic)

# result:
# [{'data': 'A', 'some_id': 1},
#  {'data': 'B', 'some_id': 2},
#  {'data': 'C', 'some_id': 2},
#  {'data': 'D', 'some_id': 3}]