更新：比较我对Divakar的回答：

Question

考虑矩阵{"error": "Please use POST request"}，它是形状为quantiles的3D矩阵的子集[:8,:3,0]。

(10,355,8)

我想要一个与quantiles = np.array([ [ 1. , 1. , 1. ], [ 0.63763978, 0.61848863, 0.75348137], [ 0.43439645, 0.42485407, 0.5341457 ], [ 0.22682343, 0.18878366, 0.25253915], [ 0.16229408, 0.12541476, 0.15263742], [ 0.12306046, 0.10372971, 0.09832783], [ 0.09271845, 0.08209844, 0.05982584], [ 0.06363636, 0.05471266, 0.03855727]])矩阵形状相同的布尔输出，其中quantiles标记中位数所在的行：

True

为实现这一目标，我有以下算法：

1）确定大于In [21]: medians Out[21]: array([[False, False, False], [ True, True, False], [False, False, True], [False, False, False], [False, False, False], [False, False, False], [False, False, False], [False, False, False]], dtype=bool)的条目：

.5

2）仅考虑In [22]: quantiles>.5 Out[22]: array([[ True, True, True], [ True, True, True], [False, False, True], [False, False, False], [False, False, False], [False, False, False], [False, False, False], [False, False, False]], dtype=bool)操作的值子集，标记最小化条目与quantiles>.5之间np.abs距离的行。稍微折磨术语，我希望与.5和np.argmin(np.abs(quantiles-.5),axis=0)的两个矩阵相交以得到上述结果。但是，我不能为我的生活找到一种方法来对子集执行quantiles>.5并保留np.argmin矩阵的形状。

PS。是的，有一个类似的问题here但是它并没有实现我的算法，我认为这可能在更大范围内更有效

Answer 1

方法＃1

这是使用broadcasting和一些屏蔽技巧的方法 -

# Mask of quantiles lesser than or equal to 0.5 to select the invalid ones
mask1 = quantiles<=0.5

# Since we are dealing with quantiles, the elems won't be > 1, 
# which can be leveraged here as we will add 1s to invalid elems, and 
# then look for argmin across each col
min_idx = (np.abs(quantiles-0.5)+mask1).argmin(0)

# Let some broadcasting magic happen here!
out = min_idx == np.arange(quantiles.shape[0])[:,None]

分步运行

1）输入：

In [37]: quantiles
Out[37]: 
array([[ 1.        ,  1.        ,  1.        ],
       [ 0.63763978,  0.61848863,  0.75348137],
       [ 0.43439645,  0.42485407,  0.5341457 ],
       [ 0.22682343,  0.18878366,  0.25253915],
       [ 0.16229408,  0.12541476,  0.15263742],
       [ 0.12306046,  0.10372971,  0.09832783],
       [ 0.09271845,  0.08209844,  0.05982584],
       [ 0.06363636,  0.05471266,  0.03855727]])

2）运行代码：

In [38]: mask1 = quantiles<=0.5
    ...: min_idx = (np.abs(quantiles-0.5)+mask1).argmin(0)
    ...: out = min_idx == np.arange(quantiles.shape[0])[:,None]
    ...:

3）分析每一步的输出：

In [39]: mask1
Out[39]: 
array([[False, False, False],
       [False, False, False],
       [ True,  True, False],
       [ True,  True,  True],
       [ True,  True,  True],
       [ True,  True,  True],
       [ True,  True,  True],
       [ True,  True,  True]], dtype=bool)

In [40]: np.abs(quantiles-0.5)+mask1
Out[40]: 
array([[ 0.5       ,  0.5       ,  0.5       ],
       [ 0.13763978,  0.11848863,  0.25348137],
       [ 1.06560355,  1.07514593,  0.0341457 ],
       [ 1.27317657,  1.31121634,  1.24746085],
       [ 1.33770592,  1.37458524,  1.34736258],
       [ 1.37693954,  1.39627029,  1.40167217],
       [ 1.40728155,  1.41790156,  1.44017416],
       [ 1.43636364,  1.44528734,  1.46144273]])

In [41]: (np.abs(quantiles-0.5)+mask1).argmin(0)
Out[41]: array([1, 1, 2])

In [42]: min_idx == np.arange(quantiles.shape[0])[:,None]
Out[42]: 
array([[False, False, False],
       [ True,  True, False],
       [False, False,  True],
       [False, False, False],
       [False, False, False],
       [False, False, False],
       [False, False, False],
       [False, False, False]], dtype=bool)

提升绩效：在评论之后，似乎得到了min_idx，我们可以这样做：

min_idx = (quantiles+mask1).argmin(0)

方法＃2

这主要关注内存效率。

# Mask of quantiles greater than 0.5 to select the valid ones
mask = quantiles>0.5

# Select valid elems
vals = quantiles.T[mask.T]

# Get vald count per col
count = mask.sum(0)

# Get the min val per col given the mask
minval = np.minimum.reduceat(vals,np.append(0,count[:-1].cumsum()))

# Get final boolean array by just comparing the min vals across each col
out = np.isclose(quantiles,minval)

Answer 2

方法＃1

这是使用broadcasting和一些屏蔽技巧的方法 -

# Mask of quantiles lesser than or equal to 0.5 to select the invalid ones
mask1 = quantiles<=0.5

# Since we are dealing with quantiles, the elems won't be > 1, 
# which can be leveraged here as we will add 1s to invalid elems, and 
# then look for argmin across each col
min_idx = (np.abs(quantiles-0.5)+mask1).argmin(0)

# Let some broadcasting magic happen here!
out = min_idx == np.arange(quantiles.shape[0])[:,None]

分步运行

1）输入：

In [37]: quantiles
Out[37]: 
array([[ 1.        ,  1.        ,  1.        ],
       [ 0.63763978,  0.61848863,  0.75348137],
       [ 0.43439645,  0.42485407,  0.5341457 ],
       [ 0.22682343,  0.18878366,  0.25253915],
       [ 0.16229408,  0.12541476,  0.15263742],
       [ 0.12306046,  0.10372971,  0.09832783],
       [ 0.09271845,  0.08209844,  0.05982584],
       [ 0.06363636,  0.05471266,  0.03855727]])

2）运行代码：

In [38]: mask1 = quantiles<=0.5
    ...: min_idx = (np.abs(quantiles-0.5)+mask1).argmin(0)
    ...: out = min_idx == np.arange(quantiles.shape[0])[:,None]
    ...:

3）分析每一步的输出：

In [39]: mask1
Out[39]: 
array([[False, False, False],
       [False, False, False],
       [ True,  True, False],
       [ True,  True,  True],
       [ True,  True,  True],
       [ True,  True,  True],
       [ True,  True,  True],
       [ True,  True,  True]], dtype=bool)

In [40]: np.abs(quantiles-0.5)+mask1
Out[40]: 
array([[ 0.5       ,  0.5       ,  0.5       ],
       [ 0.13763978,  0.11848863,  0.25348137],
       [ 1.06560355,  1.07514593,  0.0341457 ],
       [ 1.27317657,  1.31121634,  1.24746085],
       [ 1.33770592,  1.37458524,  1.34736258],
       [ 1.37693954,  1.39627029,  1.40167217],
       [ 1.40728155,  1.41790156,  1.44017416],
       [ 1.43636364,  1.44528734,  1.46144273]])

In [41]: (np.abs(quantiles-0.5)+mask1).argmin(0)
Out[41]: array([1, 1, 2])

In [42]: min_idx == np.arange(quantiles.shape[0])[:,None]
Out[42]: 
array([[False, False, False],
       [ True,  True, False],
       [False, False,  True],
       [False, False, False],
       [False, False, False],
       [False, False, False],
       [False, False, False],
       [False, False, False]], dtype=bool)

提升绩效：在评论之后，似乎得到了min_idx，我们可以这样做：

min_idx = (quantiles+mask1).argmin(0)

方法＃2

这主要关注内存效率。

# Mask of quantiles greater than 0.5 to select the valid ones
mask = quantiles>0.5

# Select valid elems
vals = quantiles.T[mask.T]

# Get vald count per col
count = mask.sum(0)

# Get the min val per col given the mask
minval = np.minimum.reduceat(vals,np.append(0,count[:-1].cumsum()))

# Get final boolean array by just comparing the min vals across each col
out = np.isclose(quantiles,minval)

在暨百分位数的numpy矩阵中识别包含列中位数的行

2 个答案:

更新：比较我对Divakar的回答：

样本数据集：

完整数据集

结论：

方法＃1

方法＃2

在暨百分位数的numpy矩阵中识别包含列中位数的行

2 个答案:

更新：比较我对​​Divakar的回答：

样本数据集：

完整数据集

结论：

方法＃1

方法＃2

更新：比较我对Divakar的回答：