使用Numpy,如何计算数字1到10的25%?

时间:2019-11-28 10:47:04

标签: python numpy percentile quartile iqr

from numpy import percentile
import numpy as np
data=np.array([1,2,3,4,5,6,7,8,9,10])
# calculate quartiles
quartile_1 = percentile(data, 25)
quartile_3 =percentile(data, 75)
# calculate min/max

print(quartile_1) # show 3.25
print(quartile_3) # shows 7.75

您能解释一下如何计算3.25和7.75值吗?我希望它们分别是3和8。

5 个答案:

答案 0 :(得分:1)

Numpy或更高版本的1.9.0版具有可选的“插值”参数,默认情况下为线性。

  

此可选参数指定当所需百分位数位于两个数据点i      

“线性”:i +(j-i)*分数,其中分数是被i和j包围的索引的分数部分。

如果您要更改此行为,则只需手动添加参数,然后使用interpolation='nearest’

覆盖默认值。

答案 1 :(得分:0)

来自numpy documentation

  

给定长度为N的向量V,V的第q个百分位数为   排序的副本中从最小到最大的方式的q / 100   V.两个最近邻居的值和距离以及   插值参数将确定百分位数,如果   归一化排名与q的位置不完全匹配。这个   如果q = 50,则函数与中位数相同;如果q = 50,则函数与中位数相同   q = 0,如果q = 100,则等于最大值。

因此,问题在于当未找到与分位数完全匹配时numpy的反应。如果使用interpolation="nearest",则会获得预期的结果:

>>> from numpy import percentile
>>> import numpy as np
>>> data=np.array([1,2,3,4,5,6,7,8,9,10])
>>> # calculate quartiles
... quartile_1 = percentile(data, 25, interpolation="nearest")
>>> quartile_3 = percentile(data, 75, interpolation="nearest")
>>> print(quartile_1) 
3
>>> print(quartile_3) 
8

答案 2 :(得分:0)

可以使用各种选项,具体取决于您希望计算百分位数的插值方法的类型。

a = np.arange(1, 11)
a  # array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

np.percentile(a, (25, 75), interpolation='midpoint') # array([3.5, 7.5])
np.percentile(a, (25, 75), interpolation='nearest')  # array([3, 8])
np.percentile(a, (25, 75), interpolation='nearest')  # array([3, 8])
np.percentile(a, (25, 75), interpolation='linear')   # array([3.25, 7.75])
np.percentile(a, (25, 75), interpolation='lower')    # array([3, 7])
np.percentile(a, (25, 75), interpolation='higher')   # array([4, 8])

您会注意到,累积相对频率是需要从中得出百分位数的

c = np.cumsum(a)
c  # ---- array([ 1,  3,  6, 10, 15, 21, 28, 36, 45, 55], dtype=int32)
c/c[-1] * 100
array([  1.81818182,   5.45454545,  10.90909091,  18.18181818,
        27.27272727,  38.18181818,  50.90909091,  65.45454545,
        81.81818182, 100.        ])

以及25和75的百分位数需要某种形式的内插。

答案 3 :(得分:0)

虽然这可能是一个插值问题,但某些quartile methods(即方法2)的答案应该是完全 [3, 8]

根据我的回答hereherenumpy改用方法3。

不幸的是,直到统计领域提出了关于四分位数的统一定义,否则混乱仍将继续。

答案 4 :(得分:0)

手动逐步计算Numpy百分位数:

步骤1 :查找长度

x = [1,2,3,4,5,6,7,8,9,10]
l = len(x) 
# Output --> 10

第2步:减去1以获得x中第一项到最后一项的距离

# n = (length - 1) 
# n = (10-1) 
# Output --> 9

第3步:将n乘以分位数,这里是25分位数或0.25分位数或1四分位数

n * 0.25
# Therefore, (9 * 0.25) 
# Output --> 2.25
# So, fraction is 0.25 part of 2.25
# m = 0.25

步骤4 :现在获得最终答案

对于线性:

# i + (j - i) * m
# Here, think i and j as values at indices
# x = [1,2,3,4,5,6,7,8,9,10]
#idx= [0,1,2,3,.........,9]
# So, for '2.25':
# value at index immediately before 2.25, is at index=2 so, i=3
# value at index immediately after 2.25, is at index=3 so, i=4
# and fractions 
3 + (4 - 3)*0.25
# Output --> 3.25

下级

# Here, based on output from Step-3
# Because, it is '2.25', 
# Find a number a index lower than 2.25
# So, lower index is '2'
# x = [1,2,3,4,5,6,7,8,9,10]
#idx= [0,1,2,3,.........,9]
# So, at index=2 we have '3' 
# Output --> 3

高级

# Here, based on output from Step-3
# Because, it is '2.25', 
# Find a number a index higher than 2.25
# So, higher index is '3'
# x = [1,2,3,4,5,6,7,8,9,10]
#idx= [0,1,2,3,.........,9]
# So, at index=3 we have '4' 
# Output --> 4

最近

# Here, based on output from Step-3
# Because, it is '2.25', 
# Find a number a index nearest to 2.25
# So, nearest index is '2'
# x = [1,2,3,4,5,6,7,8,9,10]
#idx= [0,1,2,3,.........,9]
# So, at index=2 we have '3' 
# Output --> 3

中点

# Here, based on output from Step-3
# (i + j)/2
# Here, think i and j as values at indices
# x = [1,2,3,4,5,6,7,8,9,10]
#idx= [0,1,2,3,.........,9]
# So, for '2.25'
# value at index immediately before 2.25, is at index=2 so, i=3
# value at index immediately after 2.25, is at index=3 so, i=4
(3+4)/2
# Output --> 3.5

Python中的代码:

x = np.array([1,2,3,4,5,6,7,8,9,10])
print("linear:", np.percentile(x, 25, interpolation='linear'))
print("lower:", np.percentile(x, 25, interpolation='lower'))
print("higher:", np.percentile(x, 25, interpolation='higher'))
print("nearest:", np.percentile(x, 25, interpolation='nearest'))
print("midpoint:", np.percentile(x, 25, interpolation='midpoint'))

输出:

linear: 3.25
lower: 3
higher: 4
nearest: 3
midpoint: 3.5