Question

我有一个数组代表三维空间中云水浓度的值。在云水浓度高于某个阈值的地方，我说我有一个云（见下面的横截面）。大部分区域都是干燥的，但是大部分区域都存在层云，基部大约400米。

我想要做的是提取云底和云顶的（x，y，z）坐标。然后我想在不同的三维数组上使用这些坐标来表示风速的垂直分量，以获得云底的上升气流。

我现在正在做的事情有效，但很慢。我觉得必须有一种方法可以利用NumPy加速它。

这就是我现在正在做的事情：

# 3d array representing cloud water at a particular timestep t
qc = QC(t)

# get the coordinates where there is cloud
cloud_coords = argwhere( qc > qc_thresh )

# Arrays to hold the z values of cloud base (cb) and cloud top (ct)
zcb = zeros((nx,ny))
zct = zeros((nx,ny))

# Since each coordinate (x,y) will in general have multiple z values
# for cloud I have to loop over all (x,y) and
# pull out max and min height for each point (x,y)
for x in range(nx):
    # Pull out all the coordinates with a given x value
    xslice = cloud_coords[ where(cloud_coords[:,0] == x) ]

    for y in range(ny):       
        # for the given x value select a particular y value
        column = xslice[ where(xslice[:,1] == y) ]

        try:
            zcb[x,y] = min( column[:,2] )
            zct[x,y] = max( column[:,2] )
        except:
            # Because there may not be any cloud at all
            # (a "hole") we fill the array with an average value
            zcb[x,y] = mean(zcb[zcb.nonzero()])
            zct[x,y] = mean(zct[zct.nonzero()])


# Because I intend to use these as indices I need them to be ints
zcb = array(zcb, dtype='int')
zct = array(zct, dtype='int')

输出是一个二维数组，包含云底（和顶部）的z坐标

然后我在另一个数组上使用这些索引来获取类似cloudspeed的变量：

wind = W(t)
j,i = meshgrid(arange(ny),arange(nx))
wind_base = wind[i,j,zcb]

我在模拟中执行了很多次步骤，最慢的部分是所有（x，y）坐标上的python循环。任何有关使用NumPy更快地提取这些值的帮助将不胜感激！

Answer 1

你怀疑numpy可以很好地利用你的问题是正确的。实际上，您正在进行多种低效工作，例如最后使用np.array()显式创建新数组，以及dtype int import numpy as np import matplotlib.pyplot as plt # generate dummy data qc_thresh = 0.6 nx,ny,nz = 400,400,100 qc = np.zeros((nx,ny,nz)) # insert random cloud layer qc[...,50:80] = np.random.rand(nx,ny,30) # insert holes in clouds for completeness qc[np.random.randint(nx,size=2*nx),np.random.randint(ny,size=2*nx),:] = 0 def compute_cloud_boundaries(): cloud_arr = qc > qc_thresh # find boundaries by making use of np.argmax returning first maximum zcb = np.argmax(cloud_arr,axis=-1) zct = nz - 1 - np.argmax(cloud_arr[...,::-1],axis=-1) # logical (nx,ny)-shaped array where there's a cloud cloud_inds = (zcb | (zct!=nz-1)).astype(bool) # this is short for `(zcb==0) | (zct!=nz-1)` # fill the rest with the mean zcb[np.logical_not(cloud_inds)] = zcb[cloud_inds].mean() zct[np.logical_not(cloud_inds)] = zct[cloud_inds].mean() return zcb,zct，这是python中的复杂对象3。

你可以在一些矢量化的numpy中完成大部分工作。这个想法是，它足以找到云出现的指数（沿着高度轴），或者云终止的位置。我们可以使用numpy.argmax以矢量化方式完成此操作。这真的是有效解决方案的核心：

cloud_arr = qc > qc_thresh

我针对你的方法检查了上面的内容（完成了相应的小例子），它给出了完全相同的结果。正如我所说，我的想法是np.argmax是一个逻辑数组，告诉我们湿度大到足以符合云的条件。然后我们沿着最后（高度）轴查看这个（基本上是二进制！）矩阵的最大值。调用argmax将告诉我们每个平面2d索引的第一个（最下面的）高度值。为了获得云端，我们需要反转我们的数组并从另一方做同样的事情（注意转换回结果索引）。反转数组会创建视图而不是副本，因此这也很有效。最后，我们纠正没有云的点;代替更好的约束，我们检查400x400x100返回的最高索引对应于边缘点的位置。考虑到真实的天气数据，我们可以确定最底层和最顶层的测量不与云相对应，因此这应该是一个安全的标准。

以下是show的虚拟数据的横截面：

上述In [24]: %timeit compute_cloud_boundaries() 10 loops, best of 3: 29.1 ms per loop In [25]: %timeit orig() # original loopy version from the question 1 loop, best of 3: 9.37 s per loop案件的非代表性时间安排：

(nx,ny)

速度似乎增加了300倍以上。当然，您的实际用例将是对此方法的正确测试，但它应该没问题。

对于索引步骤，您可以通过使用开放网格作为索引并使用数组广播来节省一些内存。不必分配额外的wind = W(t) i,j = np.ogrid[:nx,:ny] wind_base = wind[i,j,zcb]形状的阵列也可能加快这一步骤：

np.ogrid

正如您所看到的，(nx,1)创建了一个形状(1,ny)和meshgrid的开放网格，它们一起广播到等同于.SumoSelect{width: 200px;}调用的内容。

从多维数组

1 个答案: