Question

我有一个Python Numpy数组，它是一个2D数组，其中第二个维是一个3个整数元素的子数组。例如：

[ [2, 3, 4], [9, 8, 7], ... [15, 14, 16] ]

对于每个子阵列，我想用1代替最小数字，用0代替所有其他数字。所以上面例子的所需输出是：

[ [1, 0, 0], [0, 0, 1], ... [0, 1, 0] ]

这是一个大型数组，所以我想利用Numpy性能。我知道使用条件来操作数组元素，但是当条件是动态的时候我该怎么做？在这种情况下，条件必须类似于：

newarray = (a == min(a)).astype(int)

但我如何在每个子阵列中执行此操作？

Answer 1

您可以指定axis参数来计算 mins的二维数组（如果您保留结果的维度），那么当您执行a == a.minbyrow时，您将在每个子阵列的最小位置得到trues：

(a == a.min(1, keepdims=True)).astype(int)
#array([[1, 0, 0],
#       [0, 0, 1],
#       [0, 1, 0]])

Answer 2

这个怎么样？

import numpy as np

a = np.random.random((4,3))

i = np.argmin(a, axis=-1)
out = np.zeros(a.shape, int)
out[np.arange(out.shape[0]), i] = 1

print(a)
print(out)

示例输出：

# [[ 0.58321885  0.18757452  0.92700724]
#  [ 0.58082897  0.12929637  0.96686648]
#  [ 0.26037634  0.55997658  0.29486454]
#  [ 0.60398426  0.72253012  0.22812904]]
# [[0 1 0]
#  [0 1 0]
#  [1 0 0]
#  [0 0 1]]

它似乎比直接方法快一点：

from timeit import timeit

def dense():
    return (a == a.min(1, keepdims=True)).astype(int)

def sparse():
    i = np.argmin(a, axis=-1)
    out = np.zeros(a.shape, int)
    out[np.arange(out.shape[0]), i] = 1
    return out

for shp in ((4,3), (10000,3), (100,10), (100000,1000)):
    a = np.random.random(shp)
    d = timeit(dense, number=40)/40
    s = timeit(sparse, number=40)/40
    print('shape, dense, sparse, ratio', '({:6d},{:6d}) {:9.6g} {:9.6g} {:9.6g}'.format(*shp, d, s, d/s))

示例运行：

# shape, dense, sparse, ratio (     4,     3) 4.22172e-06 3.1274e-06   1.34992
# shape, dense, sparse, ratio ( 10000,     3) 0.000332396 0.000245348   1.35479
# shape, dense, sparse, ratio (   100,    10) 9.8944e-06 5.63165e-06   1.75693
# shape, dense, sparse, ratio (100000,  1000)  0.344177  0.189913   1.81229

根据Numpy中的动态条件替换子数组中的值

2 个答案: