更快捷的方式来分发"基于赋值的ndarray到其他ndarray的值?

时间:2016-03-21 03:35:03

标签: python numpy matrix

通常,我试图将距离矩阵分成K个折叠。具体来说,对于3 x 3情况,我的距离矩阵可能如下所示:

full = np.array([
    [0, 0, 3],
    [1, 0, 1],
    [2, 1, 0]
])

我还有一个随机生成的分配列表,其长度等于距离矩阵中所有元素的总和。对于K = 3案例,它可能如下所示:

assignments = np.array([0, 1, 0, 2, 1, 1, 0, 0])

我想创建K = 3个新的3 x 3零矩阵,其中距离矩阵的值为"分布式"根据作业清单。代码比单词更精确,所以这是我目前的尝试:

def assign(full, assignments):
    folds = [np.zeros(full.shape) for _ in xrange(np.max(assignments) + 1)]
    rows, cols = full.shape
    a = 0
    for r in xrange(rows):
        for c in xrange(cols):
            for i in xrange(full[r, c]):
                folds[assignments[a]][r, c] += 1
                a += 1
    return folds

这很有效(慢慢地),在这个例子中,

folds = assign(full, assignments)
for f in folds:
    print f

返回

[[ 0.  0.  2.]
 [ 0.  0.  0.]
 [ 1.  1.  0.]]
[[ 0.  0.  1.]
 [ 0.  0.  1.]
 [ 1.  0.  0.]]
[[ 0.  0.  0.]
 [ 1.  0.  0.]
 [ 0.  0.  0.]]

根据需要。但是,我目前的尝试非常缓慢,特别是对于N x NN的情况。如何提高此功能的速度?我应该在这里使用一些神奇的魔法吗?

我有一个想法是转换为sparse矩阵并循环非零条目。然而,这只会有所帮助

2 个答案:

答案 0 :(得分:1)

你只需要弄清楚扁平化输出中的哪些项目每次都会增加,然后用bincount汇总它们:

def assign(full, assignments):
    assert len(assignments) == np.sum(full)

    rows, cols = full.shape
    n = np.max(assignments) + 1

    full_flat = full.reshape(-1)
    full_flat_non_zero = full_flat != 0
    full_flat_indices = np.repeat(np.where(full_flat_non_zero)[0],
                                  full_flat[full_flat_non_zero])
    folds_flat_indices = full_flat_indices + assignments*rows*cols

    return np.bincount(folds_flat_indices,
                       minlength=n*rows*cols).reshape(n, rows, cols)

>>> assign(full, assignments)
array([[[0, 0, 2],
        [0, 0, 0],
        [1, 1, 0]],

       [[0, 0, 1],
        [0, 0, 1],
        [1, 0, 0]],

       [[0, 0, 0],
        [1, 0, 0],
        [0, 0, 0]]])

您可能希望打印出每个中间数组,以查看具体情况。

答案 1 :(得分:1)

您可以使用add.at执行无缓冲的就地操作:

import numpy as np

full = np.array([
    [0, 0, 3],
    [1, 0, 1],
    [2, 1, 0]
])

assignments = np.array([0, 1, 0, 2, 1, 1, 0, 0])

res = np.zeros((np.max(assignments) + 1,) + full.shape, dtype=int)

r, c = np.nonzero(full)
n = full[r, c]

r = np.repeat(r, n)
c = np.repeat(c, n)

np.add.at(res, (assignments, r, c), 1)

print(res)