Question

So what is a concise and efficient way to convert a numpy array like:

[[0, 0, 1],
[1, 0, 0],
[0, 1, 0]]

into a column like:

[[2],
 [0],
 [1]]

where the number in each column is the index value of the "1" in the original array of one hot vectors?

I was thinking of looping through the rows and creating a list of the index value of 1, but I wonder if there is a more efficient way to do it. Thank you for any suggestions.

Answer 1

更新：有关更快的解决方案，请参阅Divakar的回答。

您可以使用numpy数组的nonzero() method。它返回的元组的第二个元素是你想要的。例如，

In [56]: x
Out[56]: 
array([[0, 0, 1, 0],
       [0, 0, 1, 0],
       [0, 0, 0, 1],
       [0, 0, 0, 1],
       [1, 0, 0, 0]])

In [57]: x.nonzero()[1]
Out[57]: array([2, 2, 3, 3, 0])

根据docstring of numpy.nonzero()，＆＃34; a中的值始终以行专业，C风格的顺序＆＃34;进行测试和返回，只要你只有一个每行1个，x.nonzero()[1]将从第一行开始给出每行1的位置。（并且x.nonzero()[0]将等于range(x.shape[0])。）

要将结果作为具有形状（n，1）的数组，您可以使用reshape()方法

In [59]: x.nonzero()[1].reshape(-1, 1)
Out[59]: 
array([[2],
       [2],
       [3],
       [3],
       [0]])

或者您可以使用[:, np.newaxis]索引：

In [60]: x.nonzero()[1][:, np.newaxis]
Out[60]: 
array([[2],
       [2],
       [3],
       [3],
       [0]])

Answer 2

我们正在使用热编码数组，它确保每行只有一个1。因此，如果我们只是寻找每行的第一个非零索引，我们将得到所需的结果。因此，我们可以在每一行使用np.argmax，如此 -

a.argmax(axis=1)

如果您希望将2D数组作为o / p，只需在末尾添加单个维度 -

a.argmax(axis=1)[:,None]

运行时测试 -

In [20]: # Let's create a sample hot encoded array
    ...: a = np.zeros((1000,1000),dtype=int)
    ...: idx = np.random.randint(0,1000,1000)
    ...: a[np.arange(1000),idx] = 1
    ...: 

In [21]: %timeit a.nonzero()[1] # @Warren Weckesser's soln
100 loops, best of 3: 9.03 ms per loop

In [22]: %timeit a.argmax(axis=1)
1000 loops, best of 3: 1.15 ms per loop

Turn a numpy array of one hot row vectors into a column vector of indices

2 个答案: