Question

通常，我试图将距离矩阵分成K个折叠。具体来说，对于3 x 3情况，我的距离矩阵可能如下所示：

full = np.array([
    [0, 0, 3],
    [1, 0, 1],
    [2, 1, 0]
])

我还有一个随机生成的分配列表，其长度等于距离矩阵中所有元素的总和。对于K = 3案例，它可能如下所示：

assignments = np.array([0, 1, 0, 2, 1, 1, 0, 0])

我想创建K = 3个新的3 x 3零矩阵，其中距离矩阵的值为＆＃34;分布式＆＃34;根据作业清单。代码比单词更精确，所以这是我目前的尝试：

def assign(full, assignments):
    folds = [np.zeros(full.shape) for _ in xrange(np.max(assignments) + 1)]
    rows, cols = full.shape
    a = 0
    for r in xrange(rows):
        for c in xrange(cols):
            for i in xrange(full[r, c]):
                folds[assignments[a]][r, c] += 1
                a += 1
    return folds

这很有效（慢慢地），在这个例子中，

folds = assign(full, assignments)
for f in folds:
    print f

返回

[[ 0.  0.  2.]
 [ 0.  0.  0.]
 [ 1.  1.  0.]]
[[ 0.  0.  1.]
 [ 0.  0.  1.]
 [ 1.  0.  0.]]
[[ 0.  0.  0.]
 [ 1.  0.  0.]
 [ 0.  0.  0.]]

根据需要。但是，我目前的尝试非常缓慢，特别是对于N x N大N的情况。如何提高此功能的速度？我应该在这里使用一些神奇的魔法吗？

我有一个想法是转换为sparse矩阵并循环非零条目。然而，这只会有所帮助

Answer 1

你只需要弄清楚扁平化输出中的哪些项目每次都会增加，然后用bincount汇总它们：

def assign(full, assignments):
    assert len(assignments) == np.sum(full)

    rows, cols = full.shape
    n = np.max(assignments) + 1

    full_flat = full.reshape(-1)
    full_flat_non_zero = full_flat != 0
    full_flat_indices = np.repeat(np.where(full_flat_non_zero)[0],
                                  full_flat[full_flat_non_zero])
    folds_flat_indices = full_flat_indices + assignments*rows*cols

    return np.bincount(folds_flat_indices,
                       minlength=n*rows*cols).reshape(n, rows, cols)

>>> assign(full, assignments)
array([[[0, 0, 2],
        [0, 0, 0],
        [1, 1, 0]],

       [[0, 0, 1],
        [0, 0, 1],
        [1, 0, 0]],

       [[0, 0, 0],
        [1, 0, 0],
        [0, 0, 0]]])

您可能希望打印出每个中间数组，以查看具体情况。

Answer 2

您可以使用add.at执行无缓冲的就地操作：

import numpy as np

full = np.array([
    [0, 0, 3],
    [1, 0, 1],
    [2, 1, 0]
])

assignments = np.array([0, 1, 0, 2, 1, 1, 0, 0])

res = np.zeros((np.max(assignments) + 1,) + full.shape, dtype=int)

r, c = np.nonzero(full)
n = full[r, c]

r = np.repeat(r, n)
c = np.repeat(c, n)

np.add.at(res, (assignments, r, c), 1)

print(res)

更快捷的方式来分发＆＃34;基于赋值的ndarray到其他ndarray的值？

2 个答案: