from numpy import percentile
import numpy as np
data=np.array([1,2,3,4,5,6,7,8,9,10])
# calculate quartiles
quartile_1 = percentile(data, 25)
quartile_3 =percentile(data, 75)
# calculate min/max
print(quartile_1) # show 3.25
print(quartile_3) # shows 7.75
您能解释一下如何计算3.25和7.75值吗?我希望它们分别是3和8。
答案 0 :(得分:1)
Numpy或更高版本的1.9.0版具有可选的“插值”参数,默认情况下为线性。
此可选参数指定当所需百分位数位于两个数据点i
“线性”:i +(j-i)*分数,其中分数是被i和j包围的索引的分数部分。
如果您要更改此行为,则只需手动添加参数,然后使用interpolation='nearest’
答案 1 :(得分:0)
来自numpy
documentation:
给定长度为N的向量V,V的第q个百分位数为 排序的副本中从最小到最大的方式的q / 100 V.两个最近邻居的值和距离以及 插值参数将确定百分位数,如果 归一化排名与q的位置不完全匹配。这个 如果q = 50,则函数与中位数相同;如果q = 50,则函数与中位数相同 q = 0,如果q = 100,则等于最大值。
因此,问题在于当未找到与分位数完全匹配时numpy的反应。如果使用interpolation="nearest"
,则会获得预期的结果:
>>> from numpy import percentile
>>> import numpy as np
>>> data=np.array([1,2,3,4,5,6,7,8,9,10])
>>> # calculate quartiles
... quartile_1 = percentile(data, 25, interpolation="nearest")
>>> quartile_3 = percentile(data, 75, interpolation="nearest")
>>> print(quartile_1)
3
>>> print(quartile_3)
8
答案 2 :(得分:0)
可以使用各种选项,具体取决于您希望计算百分位数的插值方法的类型。
a = np.arange(1, 11)
a # array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
np.percentile(a, (25, 75), interpolation='midpoint') # array([3.5, 7.5])
np.percentile(a, (25, 75), interpolation='nearest') # array([3, 8])
np.percentile(a, (25, 75), interpolation='nearest') # array([3, 8])
np.percentile(a, (25, 75), interpolation='linear') # array([3.25, 7.75])
np.percentile(a, (25, 75), interpolation='lower') # array([3, 7])
np.percentile(a, (25, 75), interpolation='higher') # array([4, 8])
您会注意到,累积相对频率是需要从中得出百分位数的
c = np.cumsum(a)
c # ---- array([ 1, 3, 6, 10, 15, 21, 28, 36, 45, 55], dtype=int32)
c/c[-1] * 100
array([ 1.81818182, 5.45454545, 10.90909091, 18.18181818,
27.27272727, 38.18181818, 50.90909091, 65.45454545,
81.81818182, 100. ])
以及25和75的百分位数需要某种形式的内插。
答案 3 :(得分:0)
虽然这可能是一个插值问题,但某些quartile methods(即方法2)的答案应该是完全 [3, 8]
不幸的是,直到统计领域提出了关于四分位数的统一定义,否则混乱仍将继续。
答案 4 :(得分:0)
手动逐步计算Numpy百分位数:
步骤1 :查找长度
x = [1,2,3,4,5,6,7,8,9,10]
l = len(x)
# Output --> 10
第2步:减去1
以获得x
中第一项到最后一项的距离
# n = (length - 1)
# n = (10-1)
# Output --> 9
第3步:将n
乘以分位数,这里是25分位数或0.25分位数或1四分位数
n * 0.25
# Therefore, (9 * 0.25)
# Output --> 2.25
# So, fraction is 0.25 part of 2.25
# m = 0.25
步骤4 :现在获得最终答案
对于线性:
# i + (j - i) * m
# Here, think i and j as values at indices
# x = [1,2,3,4,5,6,7,8,9,10]
#idx= [0,1,2,3,.........,9]
# So, for '2.25':
# value at index immediately before 2.25, is at index=2 so, i=3
# value at index immediately after 2.25, is at index=3 so, i=4
# and fractions
3 + (4 - 3)*0.25
# Output --> 3.25
下级:
# Here, based on output from Step-3
# Because, it is '2.25',
# Find a number a index lower than 2.25
# So, lower index is '2'
# x = [1,2,3,4,5,6,7,8,9,10]
#idx= [0,1,2,3,.........,9]
# So, at index=2 we have '3'
# Output --> 3
高级:
# Here, based on output from Step-3
# Because, it is '2.25',
# Find a number a index higher than 2.25
# So, higher index is '3'
# x = [1,2,3,4,5,6,7,8,9,10]
#idx= [0,1,2,3,.........,9]
# So, at index=3 we have '4'
# Output --> 4
最近:
# Here, based on output from Step-3
# Because, it is '2.25',
# Find a number a index nearest to 2.25
# So, nearest index is '2'
# x = [1,2,3,4,5,6,7,8,9,10]
#idx= [0,1,2,3,.........,9]
# So, at index=2 we have '3'
# Output --> 3
中点:
# Here, based on output from Step-3
# (i + j)/2
# Here, think i and j as values at indices
# x = [1,2,3,4,5,6,7,8,9,10]
#idx= [0,1,2,3,.........,9]
# So, for '2.25'
# value at index immediately before 2.25, is at index=2 so, i=3
# value at index immediately after 2.25, is at index=3 so, i=4
(3+4)/2
# Output --> 3.5
Python中的代码:
x = np.array([1,2,3,4,5,6,7,8,9,10])
print("linear:", np.percentile(x, 25, interpolation='linear'))
print("lower:", np.percentile(x, 25, interpolation='lower'))
print("higher:", np.percentile(x, 25, interpolation='higher'))
print("nearest:", np.percentile(x, 25, interpolation='nearest'))
print("midpoint:", np.percentile(x, 25, interpolation='midpoint'))
输出:
linear: 3.25
lower: 3
higher: 4
nearest: 3
midpoint: 3.5