Question

假设我有一个列表L=[1.1, 1.8, 4.4, 5.2]。对于某个整数n，我想知道L是否val的值为n-1<val<n+1，如果是，我想知道val的索引

到目前为止，我能做的最好的事情是定义一个生成器

x = (index for index,val in enumerate(L) if n-1<val<n+1)

并使用try... except检查其是否具有适当的值。所以我们假设我正在寻找存在这样一个值的最小n> = 0 ......

L=[1.1, 1.8, 4.4, 5.2]
n=0
while True:
    x = (index for index,val in enumerate(L) if n-1<val<n+1)
    try:
        index=next(x)
        break
    except StopIteration:
        n+=1
print n,index

1 0

实际上，我正在做一个更复杂的任务。我希望能够获取n，找到第一个索引，如果它不存在，我需要做其他事情。

这对我来说似乎不是特别干净的代码。有没有更好的办法？我觉得numpy可能有答案，但我不太了解它。

Answer 1

如果对L进行排序，则可以使用bisect.bisect_left来查找所有L [＆lt;我]＆lt; n＆lt; =全L [＆gt; = i]。

然后

if n - L[i-1] < 1.0:
    val = L[i-1]
elif L[i] - n < 1.0:
    val = L[i]
else:
    val = None     # no such value found

修改：根据您的数据，您想要完成的内容，以及您希望花多少时间编写一个聪明的算法，排序可能是也可能不是很好的解决方案;在我看到更多的O（n）s挥手之前，我想指出他的实际问题似乎涉及反复探测n的各种值 - 这将很快分摊初始分类开销 - 并且他的建议上面的算法实际上是O（n ** 2）。

@AntoinePelisse：无论如何，让我们做一些分析：

from bisect import bisect_left, bisect_right
from functools import partial
import matplotlib.pyplot as plt
from random import randint, uniform
from timeit import timeit

#blues    
density_col_lin = [
    (0.000, 0.502, 0.000, 1.000),
    (0.176, 0.176, 0.600, 1.000),
    (0.357, 0.357, 0.698, 1.000),
    (0.537, 0.537, 0.800, 1.000)
]

# greens
density_col_sor = [
    (0.000, 0.502, 0.000, 1.000),
    (0.176, 0.600, 0.176, 1.000),
    (0.357, 0.698, 0.357, 1.000),
    (0.537, 0.800, 0.537, 1.000)
]

def make_data(length, density):
    max_ = length / density
    return [uniform(0.0, max_) for _ in range(length)], max_

def linear_probe(L, max_, probes):
    for p in range(probes):
        n = randint(0, int(max_))
        for index,val in enumerate(L):
            if n - 1.0 < val < n + 1.0:
                # return index
                break

def sorted_probe(L, max_, probes):
    # initial sort
    sL = sorted((val,index) for index,val in enumerate(L))
    for p in range(probes):
        n = randint(0, int(max_))
        left  = bisect_right(sL, (n - 1.0, max_))
        right = bisect_left (sL, (n + 1.0, 0.0 ), left)
        if left < right:
            index = min(sL[left:right], key=lambda s:s[1])[1]
            # return index

def main():
    densities = [0.8, 0.2, 0.08, 0.02]
    probes    = [1, 3, 10, 30, 100]
    lengths   = [[]                   for d in densities]
    lin_pts   = [[[] for p in probes] for d in densities]
    sor_pts   = [[[] for p in probes] for d in densities]

    # time each function at various data lengths, densities, and probe repetitions
    for d,density in enumerate(densities):
        for trial in range(200):
            print("{}-{}".format(density, trial))

             # length in 10 to 5000, with log density
            length = int(10 ** uniform(1.0, 3.699))
            L, max_ = make_data(length, density)
            lengths[d].append(length)

            for p,probe in enumerate(probes):
                lin = timeit(partial(linear_probe, L, max_, probe), number=5) / 5
                sor = timeit(partial(sorted_probe, L, max_, probe), number=5) / 5
                lin_pts[d][p].append(lin / probe)
                sor_pts[d][p].append(sor / probe)

    # plot the results
    plt.figure(figsize=(9.,6.))
    plt.axis([0, 5000, 0, 0.004])

    for d,density in enumerate(densities):
        xs = lengths[d]
        lcol = density_col_lin[d]
        scol = density_col_sor[d]

        for p,probe in enumerate(probes):
            plt.plot(xs, lin_pts[d][p], "o", color=lcol, markersize=4.0)
            plt.plot(xs, sor_pts[d][p], "o", color=scol, markersize=4.0)

    plt.show()

if __name__ == "__main__":
    main()

导致

enter image description here

x轴是L中的项目数，y轴是每个探针的摊销时间;绿点是sorted_probe（），蓝色是linear_probe（）。

结论：

两个函数的运行时间相对于长度非常线性
对于L的单个探针，预分配比迭代慢约4倍
交叉点似乎是大约5个探针;少于此，线性搜索更快，更多，预分类更快。

Answer 2

这是一个不依赖try ... except并且相对容易阅读的解决方案。结果它感觉更清洁＆＃34;对我来说，但总是会有一种主观性因素。

def where_within_range( sequence, lower, upper ):
    for index, value in enumerate( sequence ):
        if lower < value < upper: return index

L = [ 1.1, 1.8, 4.4, 5.2 ]

import itertools
for n in itertools.count():
    index = where_within_range( L, n - 1, n + 1 )
    if index != None: break

print n, index

如果您希望避免重复的函数调用开销，则可以按照以下方式执行此操作，再次使用StopIteration异常，但使用itertools.count和{{{ 1}}声明，（再次，＆＃34;某种程度上＆＃34;）最终看起来更清洁。或许这是因为return ... try ...条款的每一部分中只有一条陈述（这种感觉没有多少理性基础），诚然）。

except

Answer 3

我有一个有趣的想法，通过使用defaultdict并使用值(n-1)和(n+1)构建索引，它将需要循环列表一次，然后只需比较密钥/价值观，如下：

from collections import defaultdict

L = [1.1, 1.8, 4.4, 5.2]

x = defaultdict(dict)
for idx, item in enumerate(L):
    x[int(item)] = {int(item-1): item-1, int(item+1): item+1, 'index':idx}

用法：

n = 5

x[n].get(n-1) < n < x[n].get(n+1) and x[n]['index']
Out[8]: 3

n = 2

x[n].get(n-1) < n < x[n].get(n+1) and x[n]['index']
Out[10]: False

说明：

好像：

1）True和指数返回索引

2）False和索引将返回False

因为您要输入n作为整数，如果第一部分是True，它将返回第二部分index值。如果第一部分失败，则返回False。

这将返回 LAST 出现的n，如果您需要 FIRST 出现n，只需颠倒列表并<强>索引：

...
l = len(L)
for idx, item in enumerate(reversed(L)):
    x[int(item)] = {int(item-1): item-1, 
                    int(item+1): item+1, 
                    'index': l-idx-1}
...

Answer 4

现在，当我想到，我终于理解了你的任务：只需找到数组中的最小值，它的索引 - n将等于cell（mininum）。甚至更简单：

n,index = int(min(L)),L.index(min(L))

Answer 5

l可以是列表或numpy数组：

next(((i,v) for i,v in enumerate(l) if n-1<v<n+1))

使用生成器并停在第一个值上。

测试列表是否包含某个范围内的数字

5 个答案: