Question

给出以下形式的numpy数组：

x = [[4.,3.,2.,1.,8.],[1.2,3.1,0.,9.2,5.5],[0.2,7.0,4.4,0.2,1.3]]

有一种方法可以保留每行的前3个值，并在python中将其他值设置为零（无显式循环）。在上述示例的情况下，结果将是

x = [[4.,3.,0.,0.,8.],[0.,3.1,0.,9.2,5.5],[0.0,7.0,4.4,0.0,1.3]]

示例代码

import numpy as np
arr = np.array([1.2,3.1,0.,9.2,5.5,3.2])
indexes=arr.argsort()[-3:][::-1]
a = list(range(6))
A=set(indexes); B=set(a)
zero_ind=(B.difference(A)) 
arr[list(zero_ind)]=0

输出：

array([0. , 0. , 0. , 9.2, 5.5, 3.2])

上面是我的一维numpy数组的示例代码（多行）。循环遍历numpy数组的每一行并重复执行相同的计算将非常昂贵。有没有更简单的方法？

Answer 1

这里是一种使用列表推导来遍历数组并应用keep_top_3函数的方法

import numpy as np
import heapq

def keep_top_3(arr): 
    smallest = heapq.nlargest(3, arr)[-1]  # find the top 3 and use the smallest as cut off
    arr[arr < smallest] = 0 # replace anything lower than the cut off with 0
    return arr 

x = [[4.,3.,2.,1.,8.],[1.2,3.1,0.,9.2,5.5],[0.2,7.0,4.4,0.2,1.3]]
result = [keep_top_3(np.array(arr)) for arr  in x]

我希望这会有所帮助:)

Answer 2

使用np.apply_along_axis将函数应用于沿给定轴的一维切片

drive.file

输出

import numpy as np

def top_k_values(array):
    indexes = array.argsort()[-3:][::-1]
    A = set(indexes)
    B = set(list(range(array.shape[0])))
    array[list(B.difference(A))]=0
    return array

arr = np.array([[4.,3.,2.,1.,8.],[1.2,3.1,0.,9.2,5.5],[0.2,7.0,4.4,0.2,1.3]])
result = np.apply_along_axis(top_k_values, 1, arr)
print(result)

Answer 3

这是一个完全矢量化的代码，numpy之外没有第三方。它使用numpy的argpartition有效地找到第k个值。有关其他用例，请参见例如this answer。

def truncate_top_k(x, k, inplace=False):
    m, n = x.shape
    # get (unsorted) indices of top-k values
    topk_indices = numpy.argpartition(x, -k, axis=1)[:, -k:]
    # get k-th value
    rows, _ = numpy.indices((m, k))
    kth_vals = x[rows, topk_indices].min(axis=1)
    # get boolean mask of values smaller than k-th
    is_smaller_than_kth = x < kth_vals[:, None]
    # replace mask by 0
    if not inplace:
        return numpy.where(is_smaller_than_kth, 0, x)
    x[is_smaller_than_kth] = 0
    return x

Answer 4

def top_k(arr, k, axis = 0):
    top_k_idx =  = np.take_along_axis(np.argpartition(arr, -k, axis = axis), 
                                      np.arange(-k,-1), 
                                      axis = axis)  # indices of top k values in axis
    out = np.zeros.like(arr)                        # create zero array
    np.put_along_axis(out, top_k_idx,               # put idx values of arr in out
                      np.take_along_axis(arr, top_k_idx, axis = axis), 
                      axis = axis)
    return out

这应该适用于任意axis和k，但不能就地工作。如果您想就地，则更简单：

def top_k(arr, k, axis = 0):
    remove_idx =  = np.take_along_axis(np.argpartition(arr, -k, axis = axis), 
                                           np.arange(arr.shape[axis] - k), 
                                           axis = axis)    # indices to remove
    np.put_along_axis(out, remove_idx, 0, axis = axis)     # put 0 in indices

有没有办法获取一个numpy数组的每一行的前k个值（Python）？

4 个答案: