在一篇研究论文中,作者介绍了两个(3 * 3)矩阵A
和B
之间的外部产品,得到C
:
C(i, j) = sum(k=1..3, l=1..3, m=1..3, n=1..3) eps(i,k,l)*eps(j,m,n)*A(k,m)*B(l,n)
其中eps(a, b, c)
是Levi-Civita symbol。
我想知道如何在Numpy中对这样的数学运算符进行向量化,而不是天真地实现6个嵌套循环(对于i, j, k, l, m, n
)。
答案 0 :(得分:4)
您可以使用einsum
来实现Einstein summation notation:
C = np.einsum('ikl,jmn,km,ln->ij', eps, eps, A, B)
或for better performance,一次将einsum应用于两个数组:
C = np.einsum('ilm,jml->ij',
np.einsum('ikl,km->ilm', eps, A),
np.einsum('jmn,ln->jml', eps, B))
np.einsum
计算产品总和。
下标说明符'ikl,jmn,km,ln->ij'
告诉np.einsum
eps
有子标记i,k,l
,eps
有子标记j,m,n
,A
有下标k,m
,B
有下标l,n
,i,j
因此,总和超过了
形式的产品eps(i,k,l) * eps(j,m,n) * A(k,m) * B(l,n)
不在输出数组中的所有下标总结。
答案 1 :(得分:4)
它看起来像一个纯粹的基于减少的问题而不需要保持输入之间的任何轴对齐。因此,我建议使用np.tensordot
为张量建立基于矩阵乘法的解决方案。
因此,一个解决方案可以分三步实施 -
# Matrix-multiplication between first eps and A.
# Thus losing second axis from eps and first from A : k
parte1 = np.tensordot(eps,A,axes=((1),(0)))
# Matrix-multiplication between second eps and B.
# Thus losing third axis from eps and second from B : n
parte2 = np.tensordot(eps,B,axes=((2),(1)))
# Finally, we are left with two products : ilm & jml.
# We need to lose lm and ml from these inputs respectively to get ij.
# So, we need to lose last two dims from the products, but flipped .
out = np.tensordot(parte1,parte2,axes=((1,2),(2,1)))
运行时测试
方法 -
def einsum_based1(eps, A, B): # @unutbu's soln1
return np.einsum('ikl,jmn,km,ln->ij', eps, eps, A, B)
def einsum_based2(eps, A, B): # @unutbu's soln2
return np.einsum('ilm,jml->ij',
np.einsum('ikl,km->ilm', eps, A),
np.einsum('jmn,ln->jml', eps, B))
def tensordot_based(eps, A, B):
parte1 = np.tensordot(eps,A,axes=((1),(0)))
parte2 = np.tensordot(eps,B,axes=((2),(1)))
return np.tensordot(parte1,parte2,axes=((1,2),(2,1)))
计时 -
In [5]: # Setup inputs
...: N = 20
...: eps = np.random.rand(N,N,N)
...: A = np.random.rand(N,N)
...: B = np.random.rand(N,N)
...:
In [6]: %timeit einsum_based1(eps, A, B)
1 loops, best of 3: 773 ms per loop
In [7]: %timeit einsum_based2(eps, A, B)
1000 loops, best of 3: 972 µs per loop
In [8]: %timeit tensordot_based(eps, A, B)
1000 loops, best of 3: 214 µs per loop
更大的数据集 -
In [12]: # Setup inputs
...: N = 100
...: eps = np.random.rand(N,N,N)
...: A = np.random.rand(N,N)
...: B = np.random.rand(N,N)
...:
In [13]: %timeit einsum_based2(eps, A, B)
1 loops, best of 3: 856 ms per loop
In [14]: %timeit tensordot_based(eps, A, B)
10 loops, best of 3: 49.2 ms per loop