问题设置:我有一个大小为n1,n2,n3 = nx,ny,nz的3D(空间)数据网格,a或b可能nz = 1。该网格中的每个点具有大小为NDIM(通常= 4)每网格点的数据向量(a)和另一个大小为NDIMxNDIM每网格点的矩阵(b)。我想为内存和CPU最有效地计算(每点)像a.b或b.a这样的东西。
基本上,我想概括A loopless 3D matrix multiplication in python。我似乎有一个有效的结果。但我不明白。搜索谷歌和stackoverflow导致没有帮助。请解释并进一步概括!谢谢!
import numpy as np
# gives a.b per point:
nx=5
ny=8
nz=3
a = np.arange(nx*ny*nz*4).reshape(4, nx,ny,nz)
b = np.arange(nx*ny*1*4*4).reshape(4, 4, nx,ny,1)
ctrue=a*0.0
for ii in np.arange(0,nx):
for jj in np.arange(0,ny):
for kk in np.arange(0,nz):
ctrue[:,ii,jj,kk] = np.tensordot(a[:,ii,jj,kk],b[:,:,ii,jj,0],axes=[0,1])
c2 = (a[:,None,None,None] * b[:,:,None,None,None]).sum(axis=1).reshape(4,nx,ny,nz)
np.sum(ctrue-c2)
# gives 0 as required
# gives b.a per point:
ctrue2=a*0.0
for ii in np.arange(0,nx):
for jj in np.arange(0,ny):
for kk in np.arange(0,nz):
ctrue2[:,ii,jj,kk] = np.tensordot(a[:,ii,jj,kk],b[:,:,ii,jj,0],axes=[0,0])
btrans=np.transpose(b,(1,0,2,3,4))
c22 = (a[:,None,None,None] * btrans[:,:,None,None,None]).sum(axis=1).reshape(4,nx,ny,nz)
np.sum(ctrue2-c22)
# gives 0 as required
# Note that only the single line for c2 and c22 are required -- the rest of the code is for testing/comparison to see if that line works.
# Issues/Questions:
# 1) Please explain why those things work and further generalize!
# 2) After reading about None=np.newaxis, I thought something like this would work:
c22alt = (a[:,None,:,:,:] * btrans[:,:]).sum(axis=1).reshape(4,nx,ny,nz)
np.sum(ctrue2-c22alt)
# but it doesn't.
# 3) I don't see how to avoid assignment of a separate btrans. An np.transpose on b[:,:,None,None,None] doesn't work.
其他相关链接: Numpy: Multiplying a matrix with a 3d tensor -- Suggestion How to use numpy with 'None' value in Python?
答案 0 :(得分:1)
首先,您的代码非常复杂。产品a.b
和b.a
可以简化为:
c2 = (a * b).sum(axis=1)
c22 = (a * b.swapaxes(0, 1)).sum(axis=1)
请注意,您应使用np.sum(ctrue - c2)
代替np.all(ctrue == c2)
;如果两种方法碰巧用相同的总和给出结果,那么前者可能会给出错误的结果!
为什么这样做?考虑一个元素:
a0 = a[:, 0, 0, 0]
b0 = b[:, :, 0, 0, 0]
取张量点np.tensordot(a0, b0, axes=(0, 1))
相当于(a0 * b0).sum(axis=1)
。这是因为广播; (4, )
的{{1}}形状广播到a0
的{{1}}形状,并且数组按元素相乘;然后在(4, 4)
轴上求和得到张量点。
对于其他点积,b0
相当于1
,其中np.tensordot(a0, b0, axes=(0, 0))
与(a0 * b0.T).sum(axis=1)
相同,与b0.T
相同。通过转置b0.transpose()
,b0.swapaxes(0, 1)
实际上是针对b0
的另一个轴进行广播;我们可以通过a0
获得相同的结果。
NumPy元素操作的好处在于,只要它们的形状对应或可以广播,就可以完全忽略更高的轴,因此适用于b0
和(a0[:, None] * b0).sum(axis=0)
(大部分)的工作适用于{{ 1}}和a0
。
最后,我们可以使用Einstein summation:
更清楚地说明这一点b0