Question

我敢肯定有一个很好的方法来做到这一点，但是我会在google的正确搜索字词上作空，所以我将在这里询问。我的问题是：

我有2个二维数组，它们的维数相同。一个数组（数组1）是在（x，y）点的累积降水量。另一个（数组2）是相同（x，y）网格的地形高度。我想对数组1的特定高度之间的数组1求和，并创建一个条形图，其地形高度在x轴上为bin，在y轴上为累计降水量。

因此，我希望能够声明一个高度列表（例如[0, 100, 200, ..., 1000]），并针对每个垃圾箱，汇总该垃圾箱内发生的所有降水。

我可以想到一些复杂的方法来执行此操作，但是我猜想可能有一种我没有想到的更简单的方法。我的直觉是遍历我的高度列表，对超出范围的任何东西进行掩盖，总结剩余的值，将这些值添加到新数组中，然后重复。

我想知道是否有内置的numpy或类似的库可以更有效地做到这一点。

Answer 1

此代码显示了您的要求，并在注释中提供了一些解释：

import numpy as np


def in_range(x, lower_bound, upper_bound):
    # returns wether x is between lower_bound (inclusive) and upper_bound (exclusive)
    return x in range(lower_bound, upper_bound)


# vectorize allows you to easily 'map' the function to a numpy array
vin_range = np.vectorize(in_range)

# representing your rainfall
rainfall = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
# representing your height map
height = np.array([[1, 2, 1], [2, 4, 2], [3, 6, 3]])
# the bands of height you're looking to sum
bands = [[0, 2], [2, 4], [4, 6], [6, 8]]

# computing the actual results you'd want to chart
result = [(band, sum(rainfall[vin_range(height, *band)])) for band in bands]

print(result)

倒数第二行是魔术发生的地方。 vin_range(height, *band)使用矢量化函数创建一个布尔值的numpy数组，其维数与height相同，如果height的值在给定范围内，则布尔值为True，或者{{ 1}}。

通过使用该数组为具有目标值（False的数组建立索引，您将得到一个仅具有高度在目标范围内的值的数组。然后，只需将它们相加即可。

比rainfall的步骤更多（但结果相同）：

result = [(band, sum(rainfall[vin_range(height, *band)])) for band in bands]

Answer 2

您可以将np.bincount与np.digitize一起使用。 digitize从高度数组height和bin边界bins创建一个bin索引数组。 bincount然后使用bin索引对数组rain中的数据求和。

# set up
rain  = np.random.randint(0,100,(5,5))/10
height = np.random.randint(0,10000,(5,5))/10
bins = [0,250,500,750,10000]

# compute
sums = np.bincount(np.digitize(height.ravel(),bins),rain.ravel(),len(bins)+1)

# result
sums
# array([ 0. , 37. , 35.6, 14.6, 22.4,  0. ])

# check against direct method
[rain[(height>=bins[i]) & (height<bins[i+1])].sum() for i in range(len(bins)-1)]
# [37.0, 35.6, 14.600000000000001, 22.4]

Answer 3

使用numpy ma module的示例，该示例允许创建掩码数组。从文档中：

掩码数组是标准numpy.ndarray和掩码的组合。掩码可以是nomask（表示没有关联数组的值无效），也可以是布尔数组（布尔数组），它为关联数组的每个元素确定该值是否有效。

在这种情况下，这似乎是您需要的。

import numpy as np

pr = np.random.randint(0, 1000, size=(100, 100)) #precipitation map
he = np.random.randint(0, 1000, size=(100, 100)) #height map

bins = np.arange(0, 1001, 200)

values = []
for vmin, vmax in zip(bins[:-1], bins[1:]):
    #creating the masked array, here minimum included inside bin, maximum excluded.
    maskedpr = np.ma.masked_where((he < vmin) | (he >= vmax), pr)
    values.append(maskedpr.sum())

values是每个bin的值的列表，您可以绘制它们。

numpy.ma.masked_where函数返回一个条件为True且被掩码的数组。因此，您需要将条件设置为垃圾箱外部的True。
sum()方法仅在未屏蔽数组的地方执行求和。

汇总特定（多个）范围内的数据

3 个答案: