解释

Question

考虑 numpy 2D整数数组，其中一些条目为0（array1）。考虑一个不同的2D数组（array2），其中第一列具有相同的非零值array1，另一列（比如索引2）具有不同的数值（浮点数）。

如何通过将array1中的每个非零条目替换为array2第2列的对应值来创建新的array3？你怎么做超级清洁？

示例：

>>> array1
array([[0, 27, 43, 10],
       [0, 80, 15,  2],
       [0,  3,  6,  9]])

>>> array2
array([[ 10.,  4., 88.],
       [  2.,  2., 95.],
       [  9.,  2., 65.],
       [ 43.,  1., 62.],
       [ 15.,  5., 64.],
       [  6.,  6., 67.],
       [ 27.,  5., 62.],
       [ 80.,  8., 73.],
       [  3.,  9., 59.]])

>>> array3
array([[0., 62., 62., 88.],
       [0., 73., 64., 95.],
       [0., 59., 67., 65.]])

Answer 1

您可以将boolean indexing与高级numpy数组索引一起使用：

array3 = array1.astype(float) # this copies the array by default.
array3[array1 != 0] = array2[array1[array1 != 0]-1, 2]

结果是：

array([[ 0, 62., 62., 88.],
       [ 0, 73., 64., 95.],
       [ 0, 59., 67., 65.]])

解释

首先创建一个布尔数组，指示存在非零条目的位置：

>>> non_zero_mask = array1 != 0
array([[False,  True,  True,  True],
       [False,  True,  True,  True],
       [False,  True,  True,  True]], dtype=bool)

这将用于查找应替换的元素。

然后你需要找到这些元素的值：

>>> non_zero_values = array1[non_zero_mask]
array([7, 4, 1, 8, 5, 2, 9, 6, 3])

由于您的array2已订购并以值1开头，因此我们需要减去一个以找到替换值的相应行。如果您的array2未排序，则可能需要对其进行排序或在其间进行另一次索引：

>>> replacement_rows = array2[non_zero_values-1]
array([[  7.,   7.,  62.],
       [  4.,   4.,  62.],
       [  1.,   1.,  88.],
       [  8.,   8.,  73.],
       [  5.,   5.,  64.],
       [  2.,   2.,  95.],
       [  9.,   9.,  59.],
       [  6.,   6.,  67.],
       [  3.,   3.,  65.]])

>>> replacement_values = array2[non_zero_values-1, 2] # third element of that row!
array([ 62.,  62.,  88.,  73.,  64.,  95.,  59.,  67.,  65.])

然后只需将这些值分配给原始数组或新数组：

array3[non_zero_mask] = replacement_values

这种方法依赖于array2的排序，因此如果有更复杂的条件，它将会中断。但是要么要求找到值和索引之间的关系并插入它而不是我做的简单-1或者做另一个中间np.where /布尔索引。

扩展

如果您没有排序的array2并且无法对其进行排序，则可以执行以下操作：

>>> array3 = array1.astype(float)
>>> array3[array1 != 0] = array2[np.where(array2[:, 0][None, :] == array1[array1 != 0][:, None])[1], 2]
>>> array3
array([[  0.,  62.,  62.,  88.],
       [  0.,  73.,  64.,  95.],
       [  0.,  59.,  67.,  65.]])

因为这适用于相互广播数组，所以你将创建一个大小为array1.size * array1.size的数组。所以这可能不是非常有效，但仍然完全矢量化。

Numba（如果你想要速度）

numba非常棒，如果你想加快速度慢的东西，因为没有本地的numpy或scipy版本。如果您有anaconda或conda，它已经安装，因此它可能是一个可行的选择：

import numba as nb
import numpy as np

@nb.njit
def nb_replace_values(array, old_new_array):
    res = np.zeros(array.shape, dtype=np.float64)

    rows = array.shape[0]
    columns = array.shape[1]
    rows_replace_array = old_new_array.shape[0]

    for row in range(rows):
        for column in range(columns):
            val = array[row, column]
            # only replace values that are not zero
            if val != 0:
                # Find the value to replace the element with
                for ind_replace in range(rows_replace_array):
                    if old_new_array[ind_replace, 0] == val:
                        # Match found. Replace and break the innermost loop
                        res[row, column] = old_new_array[ind_replace, 2]
                        break

    return res

nb_replace_values(array1, array2)
array([[  0.,  62.,  62.,  88.],
       [  0.,  73.,  64.,  95.],
       [  0.,  59.,  67.,  65.]])

特别是对于大型阵列，这显然是最快且内存效率最高的解决方案，因为不会创建临时阵列。第一次调用会慢得多，因为函数需要动态编译。

时序：

%timeit nb_replace_values(array1, array2)

100000个循环，最佳3：每个循环6.23μs

%%timeit
array3 = array1.astype(float)
array3[array1 != 0] = array2[np.where(array2[:, 0][None, :] == array1[array1 != 0][:, None])[1], 2]

10000个循环，最佳3：每循环74.8μs

# Solution provided by @PDRX
%%timeit 
array3 = array1.astype(float)
for i in array2[:,0]:
    i_arr1,j_arr1 = np.where(array1 == i)
    i_arr2 = np.where(array2[:,0] == i)
    array3[i_arr1,j_arr1] = array2[i_arr2,2]

1000个循环，最佳3：689μs/循环

Answer 2

我不确定我是否了解您的要求，但让我们尝试使用list comprehensions：

array3 = [[array2[subitem1 - 1][2] if subitem1 != 0 else 0 for subitem1 in subarray1] for subarray1 in array1]

但它很难读，我更喜欢它表：

array3 = [
    [
        array2[subitem1 - 1][2] if subitem1 != 0 else 0
        for subitem1 in subarray1
    ]
    for subarray1 in array1
]

用不同数组的数组替换数组的条目

2 个答案:

解释

扩展

Numba（如果你想要速度）

时序：