Question

我正在寻找一种简洁的方法将整数向量转换为二进制值的二维数组，其中一些数据位于与作为索引的向量值对应的列中

即

v = np.array([1, 5, 3])
C = np.zeros((v.shape[0], v.max()))

我正在寻找的是将C转化为此的方法：

array([[ 1.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  1.],
       [ 0.,  0.,  1.,  0.,  0.]])

我想出了这个：

C[np.arange(v.shape[0]), v.T-1] = 1

但我想知道是否有更少的冗长/更优雅的方法？

谢谢！

更新

感谢您的评论！我的代码中有一个错误：如果v中有0，则会将1放入错误的位置（最后一列）。相反，我必须扩展分类数据以包括0。

只要你专门处理稀疏矩阵，jrennie的答案就是大型向量的大赢家。在我的情况下，我需要返回一个数组以实现兼容性，并且转换完全控制了优势 - 请参阅两个解决方案：

    def permute_array(vector):
        permut = np.zeros((vector.shape[0], vector.max()+1))
        permut[np.arange(vector.shape[0]), vector] = 1
        return permut

    def permute_matrix(vector):
        indptr = range(vector.shape[0]+1)
        ones = np.ones(vector.shape[0])
        permut = sparse.csr_matrix((ones, vector, indptr))
        return permut

    In [193]: vec = np.random.randint(1000, size=1000)
    In [194]: np.all(permute_matrix(vec) == permute_array(vec))
    Out[194]: True

    In [195]: %timeit permute_array(vec)
    100 loops, best of 3: 3.49 ms per loop

    In [196]: %timeit permute_matrix(vec)
    1000 loops, best of 3: 422 µs per loop

现在，添加转换：

    def permute_matrix(vector):
        indptr = range(vector.shape[0]+1)
        ones = np.ones(vector.shape[0])
        permut = sparse.csr_matrix((ones, vector, indptr))
        return permut.toarray()

    In [198]: %timeit permute_matrix(vec)
    100 loops, best of 3: 4.1 ms per loop

Answer 1

您的解决方案的一个缺点是对于大值而言效率低下。如果您想要更高效的表示，请创建scipy稀疏矩阵，例如：

import scipy.sparse
import numpy

indices = [1, 5, 3]
indptr = range(len(indices)+1)
data = numpy.ones(len(indices))
matrix = scipy.sparse.csr_matrix((data, indices, indptr))

了解Yale Format和scipy's csr_matrix以更好地理解对象（索引，indptr，数据）和用法。

请注意，我没有从上面代码中的索引中减去1。如果您想要的话，请使用indices = numpy.array([1, 5, 3])-1。

numpy变换向量到二进制矩阵

1 个答案: