Question

我使用以下代码计算给定数据集的四分位数：

#!/usr/bin/python

import numpy as np

series = [1,2,2,2,2,2,2,2,2,2,2,5,5,6,7,8]

p1 = 25
p2 = 50
p3 = 75

q1 = np.percentile(series,  p1)
q2 = np.percentile(series,  p2)
q3 = np.percentile(series,  p3)

print('percentile(' + str(p1) + '): ' + str(q1))
print('percentile(' + str(p2) + '): ' + str(q2))
print('percentile(' + str(p3) + '): ' + str(q3))

百分位函数返回四分位数，但是，我也想获取用于标记四分位数的bounderies的索引。有没有办法做到这一点？

Answer 1

试试这个：

import numpy as np
import pandas as pd
series = [1,2,2,2,2,2,2,2,2,2,2,5,5,6,7,8]
thresholds = [25,50,75]
output = pd.DataFrame([np.percentile(series,x) for x in thresholds], index = thresholds, columns = ['quartiles'])
output

通过使其成为数据框，您可以非常轻松地分配索引。

Answer 2

假设数据始终排序（感谢@ juanpa.arrivillaga），您可以使用Pandas Series类中的rank方法。 rank()有几个论点。其中之一是pct：

pct：boolean，默认为False

计算数据的百分比等级

有不同的方法来计算百分比排名。这些方法由参数method：

控制

方法：{'average'，'min'，'max'，'first'，'dense'}

您需要方法"max"：

max：群组中的最高排名

让我们看一下使用这些参数的rank()方法的输出：

import numpy as np
import pandas as pd

series = [1,2,2,2,2,2,2,2,2,2,2,5,5,6,7,8]

S = pd.Series(series)
percentage_rank = S.rank(method="max", pct=True)
print(percentage_rank)

这基本上为您提供Series中每个条目的百分位数：

0     0.0625
1     0.6875
2     0.6875
3     0.6875
4     0.6875
5     0.6875
6     0.6875
7     0.6875
8     0.6875
9     0.6875
10    0.6875
11    0.8125
12    0.8125
13    0.8750
14    0.9375
15    1.0000
dtype: float64

为了检索三个百分位数的索引，您可以在Series中查找具有与您感兴趣的百分位数相等或更高的百分比等级的第一个元素。 element是您需要的索引。

index25 = S.index[percentage_rank >= 0.25][0]
index50 = S.index[percentage_rank >= 0.50][0]
index75 = S.index[percentage_rank >= 0.75][0]

print("25 percentile: index {}, value {}".format(index25, S[index25]))
print("50 percentile: index {}, value {}".format(index50, S[index50]))
print("75 percentile: index {}, value {}".format(index75, S[index75]))

这为您提供输出：

25 percentile: index 1, value 2
50 percentile: index 1, value 2
75 percentile: index 11, value 5

Answer 3

由于数据已排序，您可以使用numpy.searchsorted返回插入值的索引以维护排序顺序。您可以指定哪个方面＆＃39;插入值。

>>> np.searchsorted(series,q1)
1
>>> np.searchsorted(series,q1,side='right')
11
>>> np.searchsorted(series,q2)
1
>>> np.searchsorted(series,q3)
11
>>> np.searchsorted(series,q3,side='right')
13

Python：获取四分位数的数组索引

3 个答案: