Question

我已经阅读了其他文章（例如here），以获取分位数的“反转”，即获取与一系列值中的某个值相对应的百分位数。

但是，对于相同的数据序列，答案并没有给我与分位数相同的值。

我还研究了分位数提供9种不同的算法来计算百分位数。

所以我的问题是：是否有可靠的方法来获得分位数函数的反函数？ ecdf没有采用“类型”参数，因此似乎无法确保它们使用相同的方法。

可复制的示例：

# Simple data
x = 0:10
pcntile = 0.5


# Get value corresponding to a percentile using quantile
(pcntile_value <- quantile(x, pcntile))     

# 50%    
# 5               # returns 5 as expected for 50% percentile     



# Get percentile corresponding to a value using ecdf function
(pcntile_rev <- ecdf(x)(5))                


# [1] 0.5454545   #returns 54.54% as the percentile for the value 5


# Not the same answer as quantile produces

Answer 1

x <- 0:10 Fn <- ecdf(x)在文档中提供了公式的结果。

Fn

现在，对象str(Fn) #function (v) # - attr(*, "class")= chr [1:3] "ecdf" "stepfun" "function" # - attr(*, "call")= language ecdf(x)是一个插值步长函数。

它会保留原始的y值和相应的environment(Fn)$x # [1] 0 1 2 3 4 5 6 7 8 9 10 environment(Fn)$y # [1] 0.09090909 0.18181818 0.27272727 0.36363636 0.45454545 0.54545455 # [7] 0.63636364 0.72727273 0.81818182 0.90909091 1.00000000值。

help('ecdf')

后者与文档所说的是用于计算它们的公式的结果完全相同。来自1:length(x)：

对于观测值x =（x1，x2，... xn），Fn是
的分数   观测值小于或等于t，即

Fn（t）=＃{xi <= t} / n = 1 / n sum（i = 1，n）指标（xi <= t）。

我将使用seq_along代替seq_along(x)/length(x) # [1] 0.09090909 0.18181818 0.27272727 0.36363636 0.45454545 0.54545455 # [7] 0.63636364 0.72727273 0.81818182 0.90909091 1.00000000 Fn(x) # [1] 0.09090909 0.18181818 0.27272727 0.36363636 0.45454545 0.54545455 # [7] 0.63636364 0.72727273 0.81818182 0.90909091 1.00000000。

O(K)

Answer 2

链接中的答案确实不错，但是看一下ecdf也许会有所帮助只需运行以下代码：

# Simple data
x = 0:10
p0 = 0.5

# Get value corresponding to a percentile using quantile
sapply(c(1:7), function(i) quantile(x, p0, type = i))
# 50% 50% 50% 50% 50% 50% 50% 
# 5.0 5.0 5.0 4.5 5.0 5.0 5.0

因此，这不是类型问题。您可以使用debug进入该功能：

# Get percentile corresponding to a value using ecdf function
debug(ecdf)
my_ecdf <- ecdf(x)

关键部分是

rval <- approxfun(vals, cumsum(tabulate(match(x, vals)))/n, 
    method = "constant", yleft = 0, yright = 1, f = 0, ties = "ordered")

此后您可以检查

data.frame(x = vals, y = round(cumsum(tabulate(match(x, vals)))/n, 3), stringsAsFactors = FALSE)

，正如您所指出的，n=11并不奇怪。如前所述，对于理论，请看另一个答案。

顺便说一句，您也可以绘制函数

plot(my_ecdf)

关于您的评论。我认为这不是可靠性的问题，而是如何定义“反分布函数（如果不存在）”的问题：

关于广义逆的一个很好的参考：Paul Embrechts，Marius Hofert：“关于广义逆的注解”，Math Meth Oper Res（2013）77：423–432 DOI

可靠地检索分位数函数的反函数

2 个答案: