Question

我有两个非常大的矩阵，我需要计算转换矩阵，例如：矩阵A

1 2 3
3 2 1
2 1 3

矩阵B：

3 2 1
1 2 3
3 2 1

然后转换矩阵应为：

   1    2    3
1  0   1/3  2/3
2  0   2/3  1/3
3  1    0    0

我目前正在使用嵌套for循环来迭代两个矩阵，然后在我的转换矩阵中递增数字，但它非常慢。有更有效的方法吗？谢谢！

Answer 1

我假设a和b是NumPy数组。您可以将TM构造为SciPy稀疏矩阵：

import numpy as np 
import scipy.sparse as sp
from itertools import chain
from collections import Counter

a = np.array([[1,2,3],[3,2,1],[2,1,3]])
b = np.array([[3,2,1],[1,2,3],[3,2,1]])

查找并计算所有实际转换：

cntr = Counter(chain.from_iterable(list(zip(*x)) for x in (zip(a,b))))
#Counter({(3, 1): 3, (1, 3): 2, (2, 2): 2, (1, 2): 1, (2, 3): 1})

构造一个稀疏的计数矩阵，其中行和列代表状态：

transition = sp.csr_matrix((list(cntr.values()), zip(*cntr.keys())))

规范化矩阵：

transition[1:,1:] / transition[1:,1:].sum(axis=1)
#array([[ 0.        ,  0.33333333,  0.66666667],
#       [ 0.        ,  0.66666667,  0.33333333],
#       [ 1.        ,  0.        ,  0.        ]])

Answer 2

使用np.add.at的更通用的trasntition矩阵构造函数：

def trans(A, B):

    Au, Ar = np.unique(A, return_inverse = 1)
    Bu, Br = np.unique(B, return_inverse = 1)
    indices = (Ar.ravel(), Br.ravel())
    out = np.zeros((Au.size, Bu.size))
    np.add.at(out, indices, 1)
    out /= out.sum(axis = 1)
    return out, Au, Bu

trans(A, B)
Out:
array([[ 0.        ,  0.33333333,  0.66666667],
       [ 0.        ,  0.66666667,  0.33333333],
       [ 1.        ,  0.        ,  0.        ]]),
 array([1, 2, 3]),
 array([1, 2, 3]))

Answer 3

与@DanielF相同的整体方法，更快（在我的测试用例中为10倍）实现。诀窍是避免np.add.at这是非常有用但不是最快的。我省略了两个变体之间相同的步骤（找到唯一的和标准化的概率）。

>>> A = np.random.randint(0, 100, (100, 100))
>>> B = np.random.randint(0, 100, (100, 100))
>>> 
>>> def f_df(A, B):
...     out = np.zeros((100, 100), int)
...     np.add.at(out, (A.ravel(), B.ravel()), 1)
...     return out
... 
>>> def f_pp(A, B):
...     return np.bincount(np.ravel_multi_index((A, B), (100, 100)).ravel(), minlength=10000).reshape(100, 100)
... 
>>> np.all(f_df(A, B) == f_pp(A, B))
True
>>> 
>>> repeat('f_df(A, B)', globals=globals(), number=1000)
[0.7909002639353275, 0.7779529448598623, 0.7819221799727529]
>>> repeat('f_pp(A, B)', globals=globals(), number=1000)
[0.07678529410623014, 0.07394189992919564, 0.0735252988524735]

生成转换矩阵

3 个答案: