Question

我看了this answer解释了如何计算特定百分位数的值，this answer解释了如何计算与每个元素对应的百分位数。

使用第一个解决方案，我可以计算该值并扫描原始数组以查找索引。
使用第二种解决方案，我可以扫描整个输出数组，查找我正在寻找的百分位数。

但是，如果我想知道对应于特定百分位数的索引（在原始数组中）（或包含最接近的索引的索引），则两者都需要额外的扫描。

是否有更直接或内置的方式来获得与百分位数相对应的指数？

注意：我的数组没有排序，我想要原始的，未排序的数组中的索引。

Answer 1

这有点令人费解，但你可以通过np.argpartition得到你所拥有的。让我们采取一个简单的阵列，并将其洗牌：

>>> a = np.arange(10)
>>> np.random.shuffle(a)
>>> a
array([5, 6, 4, 9, 2, 1, 3, 0, 7, 8])

如果你想找到，例如分位数0.25的索引，这将对应于排序数组的位置idx中的项目：

>>> idx = 0.25 * (len(a) - 1)
>>> idx
2.25

你需要弄清楚如何将它舍入为int，比如你用最接近的整数：

>>> idx = int(idx + 0.5)
>>> idx
2

如果你现在打电话给np.argpartition，这就是你得到的：

>>> np.argpartition(a, idx)
array([7, 5, 4, 3, 2, 1, 6, 0, 8, 9], dtype=int64)
>>> np.argpartition(a, idx)[idx]
4
>>> a[np.argpartition(a, idx)[idx]]
2

很容易检查这两个最后两个表达式分别是.25分位数的索引和值。

Answer 2

如果要使用numpy，还可以使用内置百分位函数。从版本1.9.0的numpy，百分位数有选项＆＃34;插值＆＃34;这允许您选择较低/较高/最接近的百分位数值。以下内容适用于未排序的数组，并查找最接近的百分位数索引：

import numpy as np
p=70 # my desired percentile, here 70% 
x=np.random.uniform(10,size=(1000))-5.0  # dummy vector

# index of array entry nearest to percentile value
i_near=abs(x-np.percentile(x,p,interpolation='nearest')).argmin()

大多数人通常会想要如上所述的最接近的百分位值。但为了完整起见，您还可以轻松指定获得低于或高于所述百分位值的条目：

# index of array entry greater than percentile value:
i_high=abs(x-np.percentile(x,p,interpolation='higher')).argmin()

# index of array entry smaller than percentile value:
i_low=abs(x-np.percentile(x,p,interpolation='lower')).argmin()

对于numpy的旧版本＆lt; v1.9.0，插值选项不可用，因此等价于：

# Calculate 70th percentile:
pcen=np.percentile(x,p)

i_high=np.asarray([i-pcen if i-pcen>=0 else x.max()-pcen for i in x]).argmin()
i_low=np.asarray([i-pcen if i-pcen<=0 else x.min()-pcen for i in x]).argmax()
i_near=abs(x-pcen).argmin()

总结：

i_high指向数组条目，该条目是下一个值等于或大于所请求的百分位数。

i_low指向数组条目，该条目是下一个值等于或小于所请求的百分位数。

i_near指向最接近百分位数的数组条目，可以更大或更小。

我的结果是：

pcen

2.3436832738049946

x[i_high]

2.3523077864975441

x[i_low]

2.339987054079617

x[i_near]

2.339987054079617

i_high,i_low,i_near

（876,368,368）

即。位置876是最接近pcen的值，但位置368更接近，但略小于百分位值。

Answer 3

您可以使用df.quantile（）在指定的分位数中选择df中的值。

df_metric_95th_percentile = df.metric[df >= df['metric'].quantile(q=0.95)]

Answer 4

您可以这样使用numpy的np.percentile。

import numpy as np

percentile = 75
mylist = [random.random() for i in range(100)] # random list

percidx = mylist.index(np.percentile(mylist, percentile, interpolation='nearest'))

Answer 5

使用numpy，

conf.py

这将根据需要返回11。

Answer 6

假设数组已排序...除非我误解你，你可以通过取数组的长度-1，乘以分位数，并舍入到最接近的整数来计算百分位数的索引

round( (len(array) - 1) * (percentile / 100.) )

应该为您提供最接近该百分位数的索引

如何获得numpy / scipy中特定百分位数的索引？

6 个答案: