在另一个数组中高效查找下一个更大的对象

时间:2019-06-05 19:40:17

标签: python-3.x numpy

是否可以删除此函数中的for循环并加快处理速度?使用此功能的矢量方法无法获得相同的结果。还是还有其他选择?

import numpy as np

indices = np.array(
    [814, 935, 1057, 3069, 3305, 3800, 4093, 4162, 4449])

within = np.array(
    [193, 207, 243, 251, 273, 286, 405, 427, 696,
     770, 883, 896, 1004, 2014, 2032, 2033, 2046, 2066,
     2079, 2154, 2155, 2156, 2157, 2158, 2159, 2163, 2165,
     2166, 2167, 2183, 2184, 2208, 2210, 2212, 2213, 2221,
     2222, 2223, 2225, 2226, 2227, 2281, 2282, 2338, 2401,
     2611, 2612, 2639, 2640, 2649, 2700, 2775, 2776, 2785,
     3030, 3171, 3191, 3406, 3427, 3527, 3984, 3996, 3997,
     4024, 4323, 4331, 4332])


def get_first_ind_after(indices, within):
    """returns array of the first index after each listed in indices

    indices and within must be sorted ascending
    """
    first_after_leading = []
    for index in indices:

        for w_ind in within:

            if w_ind > index:
                first_after_leading.append(w_ind)

                break

    # convert to np array
    first_after_leading = np.array(first_after_leading).flatten()

    return np.unique(first_after_leading)

如果有一个索引数组,则应该为每个索引数组返回下一个最大数字。

# Output:
[ 883 1004 2014 3171 3406 3984 4323]

2 个答案:

答案 0 :(得分:1)

尝试一下:

[within[within>x][0] if len(within[within>x])>0 else 0 for x in indices]

In [35]: import numpy as np
    ...: indices = np.array([814, 935, 1057, 3069, 3305, 3800, 4093, 4162, 4449])
    ...:
    ...: within = np.array(
    ...:     [193, 207, 243, 251, 273, 286, 405, 427, 696,
    ...:      770, 883, 896, 1004, 2014, 2032, 2033, 2046, 2066,
    ...:      2079, 2154, 2155, 2156, 2157, 2158, 2159, 2163, 2165,
    ...:      2166, 2167, 2183, 2184, 2208, 2210, 2212, 2213, 2221,
    ...:      2222, 2223, 2225, 2226, 2227, 2281, 2282, 2338, 2401,
    ...:      2611, 2612, 2639, 2640, 2649, 2700, 2775, 2776, 2785,
    ...:      3030, 3171, 3191, 3406, 3427, 3527, 3984, 3996, 3997,
    ...:      4024, 4323, 4331, 4332])

In [36]: [within[within>x][0] if len(within[within>x])>0 else 0 for x in indices]
Out[36]: [883, 1004, 2014, 3171, 3406, 3984, 4323, 4323, 0]

这是称为list comprehension的pythonic方法,它是foreach循环的简化版本。因此,如果我要扩展此范围:

result = []
for x in indices:
    # This next line is a boolean index into the array, if returns all of the items in the array that have a value greater than x
    y = within[within>x]
    # At this point, y is an array of all the items which are larger than x.  Since you wanted the first of these items, we'll just take the first item off of this new array, but it is possible that y is None (there are no values that match the condition), so there is a check for that
    if len(y) > 0:
         z = y[0]
    else:
         z = 0 # or None or whatever you like
    # Now add this value to the array that we are building
    result.append(z)
# Now result has the array

我这样写是因为它使用向量运算(即布尔掩码)并且还利用列表理解,这是编写返回数组的foreach的更简洁的方法。

答案 1 :(得分:1)

这是基于np.searchsorted-

def next_greater(indices, within):
    idx = np.searchsorted(within, indices)
    idxv = idx[idx<len(within)]
    idxv_unq = np.unique(idxv)
    return within[idxv_unq]

或者,idxv_unq可以这样计算,并且应该更有效-

idxv_unq = idxv[np.r_[True,idxv[:-1] != idxv[1:]]]