Question

我经历了这些主题：

他们都讨论了几种用唯一行和列计算矩阵的方法。

然而，解决方案看起来有点复杂，至少对于未经训练的眼睛而言。这是例如第一个线程的顶级解决方案，（如果我错了，请纠正我）我认为这是最安全和最快的：

np.unique(a.view(np.dtype((np.void, a.dtype.itemsize*a.shape[1])))).view(a.dtype).reshape(-1, 
a.shape[1])

无论哪种方式，上述解决方案仅返回唯一行的矩阵。我正在寻找的是np.unique

的原始功能

u, indices = np.unique(a, return_inverse=True)

不仅会返回唯一条目列表，还会返回每个项目的成员资格，但是如何为列做到这一点？

这是我正在寻找的一个例子：

array([[0, 2, 0, 2, 2, 0, 2, 1, 1, 2],
       [0, 1, 0, 1, 1, 1, 2, 2, 2, 2]])

我们会：

u       = array([0,1,2,3,4])
indices = array([0,1,0,1,1,3,4,4,3])

u中的不同值表示原始数组中唯一列的集合：

0 -> [0,0]
1 -> [2,1]
2 -> [0,1]
3 -> [2,2]
4 -> [1,2]

Answer 1

首先让我们获取唯一的索引，为此，我们需要从转置数组开始：

>>> a=a.T

使用上述修改版本获取唯一索引。

>>> ua, uind = np.unique(np.ascontiguousarray(a).view(np.dtype((np.void,a.dtype.itemsize * a.shape[1]))),return_inverse=True)

>>> uind
array([0, 3, 0, 3, 3, 1, 4, 2, 2, 4])

#Thanks to @Jamie
>>> ua = ua.view(a.dtype).reshape(ua.shape + (-1,))
>>> ua
array([[0, 0],
       [0, 1],
       [1, 2],
       [2, 1],
       [2, 2]])

为了理智：

>>> np.all(a==ua[uind])
True

重现图表：

>>> for x in range(ua.shape[0]):
...     print x,'->',ua[x]
...
0 -> [0 0]
1 -> [0 1]
2 -> [1 2]
3 -> [2 1]
4 -> [2 2]

要完全按照你的要求做，但如果必须转换数组会慢一点：

>>> b=np.asfortranarray(a).view(np.dtype((np.void,a.dtype.itemsize * a.shape[0])))
>>> ua,uind=np.unique(b,return_inverse=True)
>>> uind
array([0, 3, 0, 3, 3, 1, 4, 2, 2, 4])
>>> ua.view(a.dtype).reshape(ua.shape+(-1,),order='F')
array([[0, 0, 1, 2, 2],
       [0, 1, 2, 1, 2]])

#To return this in the previous order.
>>> ua.view(a.dtype).reshape(ua.shape + (-1,))

Answer 2

基本上，您希望np.unique返回唯一列的索引以及它们被使用的位置的索引？通过转置矩阵然后使用其他问题的代码，添加return_inverse=True，这很容易做到。

at = a.T
b = np.ascontiguousarray(at).view(np.dtype((np.void, at.dtype.itemsize * at.shape[1])))
_, u, indices = np.unique(b, return_index=True, return_inverse=True)

使用a，这会给出：

In [35]: u
Out[35]: array([0, 5, 7, 1, 6])

In [36]: indices
Out[36]: array([0, 3, 0, 3, 3, 1, 4, 2, 2, 4])

然而，我并不完全清楚你想要u是什么。如果您希望它是唯一列，那么您可以使用以下代码：

at = a.T
b = np.ascontiguousarray(at).view(np.dtype((np.void, at.dtype.itemsize * at.shape[1])))
_, idx, indices = np.unique(b, return_index=True, return_inverse=True)
u = a[:,idx]

这会给出

In [41]: u
Out[41]:
array([[0, 0, 1, 2, 2],
       [0, 1, 2, 1, 2]])

In [42]: indices
Out[42]: array([0, 3, 0, 3, 3, 1, 4, 2, 2, 4])

Answer 3

不完全确定您的目标，但请查看numpy_indexed包（免责声明：我是其作者）;这肯定会让这类问题变得更容易：

import numpy_indexed as npi
unique_columns = npi.unique(A, axis=1)
# or perhaps this is what you want?
unique_columns, indices = npi.group_by(A.T, np.arange(A.shape[1])))

查找唯一列和列成员身份

3 个答案: