Question

例如，给定：

import numpy as np
data = np.array(
    [[0, 0, 0],
    [0, 1, 1],
    [1, 0, 1],
    [1, 0, 1],
    [0, 1, 1],
    [0, 0, 0]])

我想获得一个三维数组，看起来像：

result = array([[[ 2.,  0.],
                 [ 0.,  2.]],

                [[ 0.,  2.],
                 [ 0.,  0.]]])

一种方法是：

for row in data
    newArray[ row[0] ][ row[1] ][ row[2] ] += 1

我要做的是以下内容：

for i in dimension1
   for j in dimension2
      for k in dimension3
          result[i,j,k] = (data[data[data[:,0]==i, 1]==j, 2]==k).sum()

这似乎不起作用，我希望通过坚持我的实现而不是开头提到的（或使用任何额外的导入，例如计数器）来实现期望的结果。

感谢。

Answer 1

您也可以使用numpy.histogramdd：

>>> np.histogramdd(data, bins=(2, 2, 2))[0]
array([[[ 2.,  0.],
        [ 0.,  2.]],

       [[ 0.,  2.],
        [ 0.,  0.]]])

Answer 2

您可以执行以下操作

#Get output dimension and construct output array.
>>> dshape = tuple(data.max(axis=0)+1)
>>> dshape
(2, 2, 2)
>>> out = np.zeros(shape)

如果你有numpy 1.8 +：

out.flat[np.ravel_multi_index(data.T, dshape)]+=1

否则：

#Get indices and unique the resulting array
>>> inds = np.ravel_multi_index(data.T, dshape)
>>> inds, inverse = np.unique(inds, return_inverse=True)
>>> values = np.bincount(inverse)

>>> values
array([2, 2, 2])

>>> out.flat[inds] = values
>>> out
array([[[ 2.,  0.],
        [ 0.,  2.]],

       [[ 0.,  2.],
        [ 0.,  0.]]])

numpy 1.7之前的Numpy版本没有add.at属性，如果没有它，顶级代码将无效。由于ravel_multi_index可能不是最快的算法，因此您可以考虑使用numpy数组的unique rows。实际上，这两个操作应该是等效的。

Answer 3

问题在于data[data[data[:,0]==i, 1]==j, 2]==k不是您所期望的。

让我们分开案例(i,j,k) == (0,0,0)

data[:,0]==0为[True, True, False, False, True, True]，data[data[:,0]==0]正确地为我们提供了第一个数字为0的行。

现在来自这些，我们得到第二个数字为0的行：data[data[:,0]==0, 1]==0，它们为我们提供[True, False, False, True]。这就是问题所在。因为如果我们从data获取这些索引，即data[data[data[:,0]==0, 1]==0]，我们就不会得到第一个和第二个数字为0的行，而是0th和{{1而是代替行：

3rd

如果我们现在过滤第三个数字为In [51]: data[data[data[:,0]==0, 1]==0] Out[51]: array([[0, 0, 0], [1, 0, 1]])的行，我们会得到错误的结果w.r.t.原始数据。

这就是为什么你的方法不起作用的原因。有关更好的方法，请参阅其他答案。

Answer 4

不要害怕进口。它们是Python令人敬畏的原因。

如果问题假定您已经有结果矩阵。

import numpy as np
data = np.array(
    [[0, 0, 0],
     [0, 1, 1],
     [1, 0, 1],
     [1, 0, 1],
     [0, 1, 1],
     [0, 0, 0]]
)
result = np.zeros((2,2,2))

# range of each dim, aka allowable values for each dim
dim_ranges = zip(np.zeros(result.ndim), np.array(result.shape)-1)
dim_ranges
# Out[]:
#     [(0.0, 2), (0.0, 2), (0.0, 2)]

# Multidimentional histogram will effectively "count" along each dim
sums,_ = np.histogramdd(data,bins=result.shape,range=dim_ranges)
result += sums
result
# Out[]:
#     array([[[ 2.,  0.],
#             [ 0.,  2.]],
#
#            [[ 0.,  2.],
#             [ 0.,  0.]]])

该解决方案解决了任何“结果”ndarray，无论形状如何。此外，即使您的“数据”ndarray具有超出结果矩阵范围的索引，它也能正常工作。

Python：计算数组中相同的行（没有任何导入）

4 个答案: