Question

我们可以用

创建ECDF

import numpy as np
from statsmodels.distributions.empirical_distribution import ECDF
ecdf = ECDF([3, 3, 1, 4])

然后用

获得ECDF

ecdf(x)

但是，如果我想知道x百分位数为97.5％怎么办？

来自http://www.statsmodels.org/stable/generated/statsmodels.distributions.empirical_distribution.ECDF.html?highlight=ecdf，似乎没有实施。

有没有办法做到这一点？还是其他任何图书馆？

Answer 1

由于经验CDF只是在每个数据点放置1 / n的质量，因此第97.5分位数只是大于所有其他点的97.5％的数据点。要查找此值，您只需按升序对数据进行排序，然后找到0.975n的最大值。

sample = [1, 5, 2, 10, -19, 4, 7, 2, 0, -1]
n = len(sample)
sort = sorted(sample)
print sort[int(n * 0.975)]

产生：

由于我们记得比离散分布（如经验cdf），分位数函数被定义here（抱歉，不能嵌入图像，因为这是我的第一篇文章），我们意识到我们必须取0.975n（向上舍入）最大值。

希望这有帮助！

编辑（1/16/18）易读性。

Answer 2

这是我的建议。线性插值因为dfs只能从相当大的样本中有效估计。可以获得插值线段，因为它们的端点出现在样本中的不同值处。

import statsmodels.distributions.empirical_distribution as edf
from scipy.interpolate import interp1d
import numpy as np
import matplotlib.pyplot as plt

sample = [1,4,2,6,5,5,3,3,5,7]
sample_edf = edf.ECDF(sample)

slope_changes = sorted(set(sample))

sample_edf_values_at_slope_changes = [ sample_edf(item) for item in slope_changes]
inverted_edf = interp1d(sample_edf_values_at_slope_changes, slope_changes)

x = np.linspace(0.1, 1)
y = inverted_edf(x)
plt.plot(x, y, 'ro', x, y, 'b-')
plt.show()

print ('97.5 percentile:', inverted_edf(0.975))

它产生以下输出，

97.5 percentile: 6.75

和此图表。

Answer 3

numpy.quantile(x, q=.975)将沿数组x返回ecdf为0.975的值。

类似地，Series / DataFrames有pandas.quantile(q=0.97)。

Python：逆经验累积分布函数（ECDF）？

3 个答案: