我有一个三维数组(下面是z
),例如及时表示一系列2D阵列(下面,a1
和a2
)。我想沿着它们的轴选择所有这些2D数组的一些值(条件在两个参考轴(x
和y
下面),然后执行一些操作(例如,均值,和,... )由此产生的较小的" 2D阵列的继承。
下面的代码提出了几种方法。我发现solution1
非常不优雅,但似乎比solution2
表现得更快。为什么会这样,并且有更好的方法(更简洁,更有效(速度和记忆))吗?
关于Step2,哪一个是最佳选择,是否还有其他更有效的选项,为什么计算C2
不起作用?谢谢!
[灵感来源:Get mean of 2D slice of a 3D array in numpy]
import numpy
import time
# Control parameters (to be modified to make different tests)
xx=1000
yy=6000
# Some 2D arrays, z is a 3D array containing a succesion of such arrays (2 here)
a1=numpy.arange(xx*yy).reshape((yy, xx))
a2=numpy.linspace(0,100, num=xx*yy).reshape((yy, xx))
z=numpy.array((a1, a2))
# Axes x and y along which conditioning for the 2D arrays is made
x=numpy.arange(xx)
y=numpy.arange(yy)
# Condition is on x and y, to be applied on a1 and a2 simultaneously
xmin, xmax = xx*0.4, xx*0.8
ymin, ymax = yy*0.2, yy*0.5
xcond = numpy.logical_and(x>=xmin, x<=xmax)
ycond = numpy.logical_and(y>=ymin, y<=ymax)
def solution1():
xcond2D = numpy.tile(xcond, (yy, 1))
ycond2D = numpy.tile(ycond[numpy.newaxis].transpose(), (1, xx))
xymask = numpy.logical_not(numpy.logical_and(xcond2D, ycond2D))
xymaskzdim = numpy.tile(xymask, (z.shape[0], 1, 1))
return numpy.ma.MaskedArray(z, xymaskzdim)
def solution2():
return z[:,:,xcond][:,ycond, :]
start=time.clock()
z1=solution1()
end=time.clock()
print "Solution1: %s sec" % (end-start)
start=time.clock()
z2=solution2()
end=time.clock()
print "Solution2: %s sec" % (end-start)
# Step 2
# Now compute some calculation on the resulting z1 or z2
print "A1: ", z2.reshape(z2.shape[0], z2.shape[1]*z2.shape[2]).mean(axis=1)
print "A2: ", z1.reshape(z1.shape[0], z1.shape[1]*z1.shape[2]).mean(axis=1)
print "B1: ", z2.mean(axis=2).mean(axis=1)
print "B2: ", z1.mean(axis=2).mean(axis=1)
print "Numpy version: ", numpy.version.version
print "C1: ", z2.mean(axis=(1, 2))
print "C2: ", z1.mean(axis=(1, 2))
输出:
Solution1: 0.0568935728474 sec
Solution2: 0.157177904729 sec
A1: [ 2.10060000e+06 3.50100058e+01]
A2: [2100600.0 35.01000583500077]
B1: [ 2.10060000e+06 3.50100058e+01]
B2: [2100600.0 35.010005835000975]
Numpy version: 1.7.1
C1: [ 2.10060000e+06 3.50100058e+01]
C2:
TypeError: tuple indices must be integers, not tuple
答案 0 :(得分:3)
通过切换选择顺序可以提高速度:
def solution3():
return z[:,ycond, :][...,xcond]
N = 10
print timeit.timeit("solution1()", setup="from __main__ import solution1, solution2, solution3, z, xcond, ycond, xx, yy", number=N)
print timeit.timeit("solution2()", setup="from __main__ import solution1, solution2, solution3, z, xcond, ycond, xx, yy", number=N)
print timeit.timeit("solution3()", setup="from __main__ import solution1, solution2, solution3, z, xcond, ycond, xx, yy", number=N)
# 0.439269065857 # solution1
# 0.752536058426 # solution2
# 0.340197086334 # solution3
<小时/>
C2
的计算不起作用,因为掩码数组不支持将axis关键字设置为元组。相反,你可以这样做:
print "C2: ", z1.mean(axis=2).mean(axis=1)
<小时/> 同样值得注意的是,顺便说一句,如果你计算完整的时间,包括均值,你的原始
solution2
比solution1
更快,可能是因为1)掩盖的数组比正常的numpy慢; 2)你在掩码数组中有更多的元素可供查看。当然,solution3
比两者都快,因为它在两个步骤都更快。也就是说,屏蔽阵列通常很慢,因此转向它们以获得速度增益通常被认为是无效的。
print timeit.timeit("z2.mean(axis=(1, 2))", setup="from __main__ import z1, z2", number=N)
print timeit.timeit("z1.mean(axis=2).mean(axis=1)", setup="from __main__ import z1, z2", number=N)
0.134118080139 # z2.mean normal numpy
1.08952116966 # z1.mean masked
<小时/> 要测试沿不同轴的布尔选择的效率,请将数组设置为方形并单独尝试。
print timeit.timeit("z[:,ycond,:]", setup="from __main__ import solution4, z, xcond, ycond, xx, yy", number=N)
print timeit.timeit("z[:,:,xcond]", setup="from __main__ import solution4, z, xcond, ycond, xx, yy", number=N)
# running the above with xx=6000, yy=6000 gives
# 1.44903206825
# 5.98445320129