Question

我有一系列距离指数。

d    
array([[  0.        ,   5.38516481,   8.60232527,   7.61577311,
          3.        ,   4.12310563,  12.36931688],
       [  5.38516481,   0.        ,   5.        ,   7.        ,
          7.07106781,   2.        ,  13.34166406],
       [  8.60232527,   5.        ,   0.        ,   6.164414  ,
          8.77496439,   6.70820393,  10.34408043],
       [  7.61577311,   7.        ,   6.164414  ,   0.        ,
          8.18535277,   8.06225775,  10.04987562],
       [  3.        ,   7.07106781,   8.77496439,   8.18535277,
          0.        ,   6.164414  ,  10.09950494],
       [  4.12310563,   2.        ,   6.70820393,   8.06225775,
          6.164414  ,   0.        ,  13.92838828],
       [ 12.36931688,  13.34166406,  10.34408043,  10.04987562,
         10.09950494,  13.92838828,   0.        ]])
a = np.argsort(d,axis=1)[:,-3:]
a

array([[3, 2, 6],
       [3, 4, 6],
       [0, 4, 6],
       [5, 4, 6],
       [3, 2, 6],
       [2, 3, 6],
       [0, 1, 5]], dtype=int64)

需要累计检查从最后一栏到第一栏。

我试着这样做：

unique, counts = numpy.unique(a, return_counts=True)
x = dict(zip(unique, counts))
sorted(x.items(), key = lambda x: x[1],reverse=True)

[(6, 6), (3, 4), (2, 3), (4, 3), (0, 2), (5, 2), (1, 1)]

在上面的元组列表中，(2, 3) and (4, 3)都有相同的计数。但是当我们从最后一列到第一列累积检查时。我需要将列表作为(4, 3), (2, 3)，因为4列之前发生了4。

最大出现次数3的预期输出：

[6, 3, 4]

进行验证检查：

a = np.array([[2, 3, 6],
   [2, 4, 5],
   [0, 4, 3],
   [1, 4, 6],
   [2, 3, 5],
   [3, 2, 6],
   [0, 1, 5]])
unique, counts = numpy.unique(a, return_counts=True)
x = dict(zip(unique, counts))
sorted(x.items(), key = lambda x: x[1],reverse=True)

[(2, 4), (3, 4), (4, 3), (5, 3), (6, 3), (0, 2), (1, 2)]

在上面的列表中，我们需要将列表作为(3, 4) then (2, 4)和(5, 3), (6, 3) and then (4, 3)之一，因为(5, 3), (6, 3)发生在最后一列中的第一列之前。最后，如果有相同的计数在与(5, 3), (6, 3)对相同的列中，首先显示具有最大距离的索引，如上面d数组中所示。
注意：验证矩阵是手动创建的，距离不存在，因为第一个矩阵是真实的。

请给我一般的解决方案，并且可以适用于任何这样的阵列。我尝试编码但无法获得正确的逻辑来完成任务。我知道我在列中应用np.argmax()但我需要累计检查。

如果您不理解问题的任何部分，请发表评论我会澄清。

Answer 1

以下内容需要numpy 1.13+，因为它使用新的axis参数unique。

import numpy as np

a = np.array([[2, 3, 6],
   [2, 4, 5],
   [0, 4, 3],
   [1, 4, 6],
   [2, 3, 5],
   [3, 2, 6],
   [0, 1, 5]]) * 1000 # do not rely on uniques being 0,1,2,3...

# add column indices
ac = np.c_[a.ravel(), np.outer(np.ones((len(a),), a.dtype), np.arange(3)).ravel()]

# find uniq pairs (data, col ind)
uniq, cnts = np.unique(ac, return_counts=True, axis=0)
uniquniq, uniqidx = np.unique(uniq[:, 0], return_inverse=True)

# make grid uniq els x col idx fill with counts
fullcnts = np.zeros((len(uniquniq), 3), dtype=int)
fullcnts[uniqidx, uniq[:, 1]] = cnts
cumcnts = np.cumsum(fullcnts[:, ::-1], axis=-1)

# order by sum and then column cnts as tie breakers
order = np.lexsort((cumcnts[:, 1], cumcnts[:, 0], cumcnts[:, 2]))[::-1]
result = list(zip(uniquniq[order], cumcnts[order, 2]))

# [(3000, 4), (2000, 4), (6000, 3), (5000, 3), (4000, 3), (1000, 2), (0, 2)]

行由行：

1）我们制作一个看起来像[（2,0），（3,1），（6,2），（2,0），（4,1），（5,2）的新数组， ...，即a的每个元素及其列索引。

2）这样我们可以按列计算unique次出现次数。例如，uniq元素（2,0）返回的计数将是第0列中2的数量。

3）从独特的对中我们现在提取实际的唯一值。 uniqidx与uniq相同，但每个元素都替换为uniquniq

中的位置（索引）

4）接下来我们建立一个表独特的x列

5）并将所有计数放在适当的位置

6）我们然后计算的计数。（实际上没有必要使用累积总和，但它也没有坏处）

7）我们现在拥有排列唯一身份所需的所有部分。 lexsort是类似argsort的间接排序，只有您可以按多个向量排序，传递的最后一个向量首先被考虑。我们将cumcnts[:, 2]放在总计数中，接下来（如果是平局）cumcnts[:, 0]这是最后一列的计数，最后cumcnts[:, 1]是最后一列的计数中柱结合。如（6）中所述，我们也可以单独使用中间列的计数。

8）lexsort返回一个索引（order），我们用它来按正确的顺序排列唯一身份及其计数。

如何根据累积列，当前列和距离对数组中值的最大出现次数进行排序

1 个答案: