numpy - 计数相等的数组

时间:2016-08-22 05:21:06

标签: python-2.7 numpy

我想计算分割大矩阵后遇到的等矩阵的数量。

mat1 = np.zeros((4, 8))

split4x4 = np.split(mat1, 4)

现在我想知道split4x4中有多少个相等的矩阵,但collections.Counter(split4x4)会抛出错误。是否有内置的方式在numpy这样做?

2 个答案:

答案 0 :(得分:1)

这可以使用numpy_indexed包以完全向量化的方式完成(免责声明:我是其作者):

import numpy_indexed as npi
unique_rows, row_counts = npi.count(mat1)

这应该比使用collections.Counter快得多。

答案 1 :(得分:1)

也许最简单的方法是使用np.unique并展平拆分数组,将它们作为元组进行比较:

import numpy as np
# Generate some sample data:
a = np.random.uniform(size=(8,3))
# With repetition:
a = np.r_[a,a]
# Split a in 4 arrays
s = np.asarray(np.split(a, 4))
s = [tuple(e.flatten()) for e in s]
np.unique(s, return_counts=True)

备注:版本1.9.0中return_counts新版本的np.unique参数。

另一个纯粹的numpy解决方案受到that post

的启发
# Generate some sample data:
In: a = np.random.uniform(size=(8,3))
# With some repetition
In: a = r_[a,a]
In: a.shape
Out: (16,3)
# Split a in 4 arrays
In: s = np.asarray(np.split(a, 4))
In: print s
Out: [[[ 0.78284847  0.28883662  0.53369866]
       [ 0.48249722  0.02922249  0.0355066 ]
       [ 0.05346797  0.35640319  0.91879326]
       [ 0.1645498   0.15131476  0.1717498 ]]

      [[ 0.98696629  0.8102581   0.84696276]
       [ 0.12612661  0.45144896  0.34802173]
       [ 0.33667377  0.79371788  0.81511075]
      [ 0.81892789  0.41917167  0.81450135]]

      [[ 0.78284847  0.28883662  0.53369866]
       [ 0.48249722  0.02922249  0.0355066 ]
       [ 0.05346797  0.35640319  0.91879326]
       [ 0.1645498   0.15131476  0.1717498 ]]

      [[ 0.98696629  0.8102581   0.84696276]
       [ 0.12612661  0.45144896  0.34802173]
       [ 0.33667377  0.79371788  0.81511075]
       [ 0.81892789  0.41917167  0.81450135]]]
In: s.shape
Out: (4, 4, 3)
# Flatten the array:
In: s = asarray([e.flatten() for e in s])
In: s.shape
Out: (4, 12)
# Sort the rows using lexsort:
In: idx = np.lexsort(s.T)
In: s_sorted = s[idx]
# Create a mask to get unique rows
In: row_mask = np.append([True],np.any(np.diff(s_sorted,axis=0),1))
# Get unique rows:
In: out = s_sorted[row_mask]
# and count:
In: for e in out:
        count = (e == s).all(axis=1).sum()
        print e.reshape(4,3), count
Out:[[ 0.78284847  0.28883662  0.53369866]
     [ 0.48249722  0.02922249  0.0355066 ]
     [ 0.05346797  0.35640319  0.91879326]
     [ 0.1645498   0.15131476  0.1717498 ]] 2
    [[ 0.98696629  0.8102581   0.84696276]
     [ 0.12612661  0.45144896  0.34802173]
     [ 0.33667377  0.79371788  0.81511075]
     [ 0.81892789  0.41917167  0.81450135]] 2