我需要对三个4维数组进行装箱以创建多维直方图。在下面的示例中,我使用了numpy,但是使用xarray从NetCDF文件中读取了实际的数组。
我知道xarray在后端使用dask,我尝试在我使用的计算机上创建一个小型dask集群,该集群有20个内核,但是在for循环中没有任何加速,但是我确实在数字化步骤中获得加速。
我希望有人可以帮助我基于dask并行化for循环。
import numpy as np
# Initial datasets
s = np.random.rand(5,2,3,4)
ws = np.random.rand(5,2,3,4)
wd = np.random.rand(5, 2, 3, 4)
# Digitize to different bins
s_map = np.digitize(s, [0, .5, 1])
ws_map = np.digitize(ws, [0, .25, .5, .75, 1])
wd_map = np.digitize(wd, [.25, .5, 1])
# Get indexes that have values
s_ids = np.unique(s_map)
ws_ids = np.unique(ws_map)
wd_ids = np.unique(wd_map)
# Create output array
count = np.zeros((s_ids.size, ws_ids.size, wd_ids.size) + s.shape[1:])
# Loop over each of the maps to count how many values fall into each bin
for i, s_id in enumerate(s_ids):
s_mask = s_map == s_id
for j, ws_id in enumerate(ws_ids):
ws_mask = s_mask & (ws_map == ws_id)
for k, wd_id in enumerate(wd_ids):
mask = ws_mask & (wd_map == wd_id)
count[i, j, k, ...] += np.count_nonzero(mask, axis=0)