Question

让我们考虑一个二维数组 A

2   3   5   7
2   3   5   7
1   7   1   4
5   8   6   0
2   3   5   7

第一行，第二行和最后一行是相同的。我要查找的算法应返回每个不同行的相同行数（=每个元素的重复数）。如果脚本可以很容易地修改，也可以计算相同列的数量，那就太好了。

我使用低效的朴素算法来做到这一点：

import numpy
A=numpy.array([[2,  3,  5,  7],[2,  3,  5,  7],[1,  7,  1,  4],[5,  8,  6,  0],[2,  3,  5,  7]])
i=0
end = len(A)
while i<end:
    print i,
    j=i+1
    numberID = 1
    while j<end:
        print j
        if numpy.array_equal(A[i,:] ,A[j,:]):
            numberID+=1
        j+=1
    i+=1
print A, len(A)

预期结果：

array([3,1,1]) # number identical arrays per line

我的算法看起来像在numpy中使用本机python，因此效率低下。谢谢你的帮助。

Answer 1

在unumpy＆gt; = 1.9.0中，np.unique有一个return_counts关键字参数，您可以将其与解决方案here结合使用以获取计数：

b = np.ascontiguousarray(A).view(np.dtype((np.void, A.dtype.itemsize * A.shape[1])))
unq_a, unq_cnt = np.unique(b, return_counts=True)
unq_a = unq_a.view(A.dtype).reshape(-1, A.shape[1])

>>> unq_a
array([[1, 7, 1, 4],
       [2, 3, 5, 7],
       [5, 8, 6, 0]])

>>> unq_cnt
array([1, 3, 1])

在较旧的numpy中，您可以复制np.unique does，这看起来像是：

a_view = np.array(A, copy=True)
a_view = a_view.view(np.dtype((np.void,
                               a_view.dtype.itemsize*a_view.shape[1]))).ravel()
a_view.sort()
a_flag = np.concatenate(([True], a_view[1:] != a_view[:-1]))
a_unq = A[a_flag]
a_idx = np.concatenate(np.nonzero(a_flag) + ([a_view.size],))
a_cnt = np.diff(a_idx)

>>> a_unq
array([[1, 7, 1, 4],
       [2, 3, 5, 7],
       [5, 8, 6, 0]])

>>> a_cnt
array([1, 3, 1])

Answer 2

您可以对行条目进行词法分析，这将为您提供按排序顺序遍历行的索引，使搜索O（n）而不是O（n ^ 2）。请注意，默认情况下，最后一列中的元素排序最后，即行按字母顺序排列＆＃39;从右到左而不是从左到右。

In [9]: a
Out[9]: 
array([[2, 3, 5, 7],
       [2, 3, 5, 7],
       [1, 7, 1, 4],
       [5, 8, 6, 0],
       [2, 3, 5, 7]])

In [10]: lexsort(a.T)
Out[10]: array([3, 2, 0, 1, 4])

In [11]: a[lexsort(a.T)]
Out[11]: 
array([[5, 8, 6, 0],
       [1, 7, 1, 4],
       [2, 3, 5, 7],
       [2, 3, 5, 7],
       [2, 3, 5, 7]])

Answer 3

您可以使用Counter模块中的collections类进行此操作。

它的工作原理如下：

x = [2, 2, 1, 5, 2]
from collections import Counter
c=Counter(x)
print c

输出：计数器（{2：3,1：1,5：1}）

由于x的每个值本身都是一个不可散列的数据结构列表，因此您将遇到的问题就在于您的问题。如果你可以在一个元组中转换x的每个值，它应该起作用：

x = [(2,  3,  5,  7),(2,  3,  5,  7),(1,  7,  1,  4),(5,  8,  6,  0),(2,  3,  5,  7)]
from collections import Counter
c=Counter(x)
print c

输出：计数器（{（2,3,5,7）：3，（5,8,6,0）：1，（1,7,1,4）：1}）

在nd-array中计算相同子数组的最快方法？

3 个答案: