Question

假设我有一个3D dask数组，表示整个美国[Time, Lat, Lon]的温度时间序列。我想获得100个不同位置的表格时间序列。使用numpy花式索引，这看起来像[:, [lat1, lat2...], [lon1, lon2...]]。 Dask数组还不允许这种索引。鉴于这种限制，完成此任务的最佳方法是什么？

Answer 1

使用vindex索引器。这仅接受逐点索引或完整切片：

In [1]: import dask.array as da

In [2]: import numpy as np

In [3]: x = np.arange(1000).reshape((10, 10, 10))

In [4]: dx = da.from_array(x, chunks=(5, 5, 5))

In [5]: xcoords = [1, 3, 5]

In [6]: ycoords = [2, 4, 6]

In [7]: x[:, xcoords, ycoords]
Out[7]:
array([[ 12,  34,  56],
       [112, 134, 156],
       [212, 234, 256],
       [312, 334, 356],
       [412, 434, 456],
       [512, 534, 556],
       [612, 634, 656],
       [712, 734, 756],
       [812, 834, 856],
       [912, 934, 956]])

In [8]: dx.vindex[:, xcoords, ycoords].compute()
Out[8]:
array([[ 12, 112, 212, 312, 412, 512, 612, 712, 812, 912],
       [ 34, 134, 234, 334, 434, 534, 634, 734, 834, 934],
       [ 56, 156, 256, 356, 456, 556, 656, 756, 856, 956]])

一些警告：

这还没有在numpy数组中提供，但是提出了。请参阅提案here。
这与numpy花式索引不完全兼容，因为它将新轴始终放在前面。一个简单的transpose可以重新排列这些：

例如：

In [9]: dx.vindex[:, xcoords, ycoords].T.compute()
Out[9]:
array([[ 12,  34,  56],
       [112, 134, 156],
       [212, 234, 256],
       [312, 334, 356],
       [412, 434, 456],
       [512, 534, 556],
       [612, 634, 656],
       [712, 734, 756],
       [812, 834, 856],
       [912, 934, 956]])

在dask数组中切片n个单独的元素

1 个答案: