Question

我有一个包含数百万个数字的列表，这些数字总是在增加到最后，我需要找到并返回指定范围内的数字，例如如果数字大于X但小于Y，则列表中的数字可能会发生变化，而I＆＃39; m也会搜索更改

我一直在使用这种方法，请注意这是一个基本的例子，数字不统一或与我的程序中显示的相同

l = [i for i in range(2000000)]
nums = []
for element in l:
    if element > 950004:
        break
    if element > 950000:
        nums.append(element)
#[950001, 950002, 950003, 950004]

虽然速度很快，但我觉得我的程序运行速度要快一些，但数字变化很大，所以我想知道是否有更好的方法来做大熊猫系列还是一个numpy阵列？但到目前为止，我所做的只是在numpy中做出一个例子：

a = numpy.array(l,dtype=numpy.int64)

大熊猫系列会更实用吗？利用query（）？使用数组而不是python对象的python列表来处理这个问题的最佳方法是什么

Answer 1

这是使用二分搜索的解决方案。你说的是数百万的数字。技术上，二进制搜索可以通过将运行时复杂度降低到O（log n）而忽略最后的切片步骤来使算法更快。

import bisect

l = [i for i in range(2000000)]
lower_bound = 950000
upper_bound = 950004

lower_bound_i = bisect.bisect_left(l, lower_bound)
upper_bound_i = bisect.bisect_right(l, upper_bound, lo=lower_bound_i)
nums = l[lower_bound_i:upper_bound_i]

Answer 2

以下是二元搜索的两种实现（基于here的代码） - 一种搜索上限，另一种搜索下限。这对你有用吗？

def binary_search_upper(seq, limit):
    min = 0
    max = len(seq) - 1
    while True:
        if max < min:
            return -1
        m = (min + max) / 2
        if m == (len(seq) -1) or (seq[m] <= limit and seq[m+1] > limit):
            return m
        elif seq[m] < limit:
            min = m+1
        else:
            max = m - 1

def binary_search_lower(seq, limit):
    min = 0
    max = len(seq) - 1
    while True:
        if max < min:
            return -1
        m = (min + max) / 2
        if m == 0 or (seq[m] >= limit and seq[m-1] < limit):
            return m
        elif seq[m] < limit:
            min = m+1
        else:
            max = m - 1


l = [i for i in range(2000000)]
print binary_search_upper(l, 950004)
print binary_search_lower(l, 950000)

Answer 3

您可以使用numpy来使用布尔切片获取列表的子集。

import numpy as np
a = np.arange(2000000)
nums = a[(950000<a) & (a<=950004)]
nums
# returns
array([950001, 950002, 950003, 950004])

Python在列表或数组

3 个答案: