Question

考虑这个玩具示例：

老师想要计算班上学生的中位数。但并非所有学生每天都上课，所以在任何一天，计算出的中位数高度可能会有所不同。他们在课堂上和他们的身高的可能性在下表中给出。鉴于此信息，他可以估计预期的中位数。

>set.seed(123)
>data1 <- data.frame(Student=c(LETTERS[1:10]), Height.cm=sort( rnorm(n=10, mean=140, sd=10)), Prob.in.class=c(1,.75,1,.5,1,1,1,.25,1,.5))

>data1

   Student Height.cm Prob.in.class
1        A  127.3494          1.00
2        B  133.1315          0.75
3        C  134.3952          1.00
4        D  135.5434          0.50
5        E  137.6982          1.00
6        F  140.7051          1.00
7        G  141.2929          1.00
8        H  144.6092          0.25
9        I  155.5871          1.00
10       J  157.1506          0.50

R中估算分布的中位数（或任意分位数）的最快方法是什么？

对于我的实际计算，我需要估计数百个不同向量的中位数和任意分位数，每个向量具有数万个点（和相关概率）。我已经看到了suggestion，其中使用梯形方法估计概率密度函数，但我不确定这是最好的方法。

您可以提供任何建议，我们将不胜感激。谢谢！

Answer 1

这样的事情应该有效，但要注意权重向量，如下所示

#your data
set.seed(123)
data1 <- data.frame(Student=c(LETTERS[1:10]), Height.cm=sort( rnorm(n=10, mean=140, sd=10)), Prob.in.class=c(1,.75,1,.5,1,1,1,.25,1,.5))

#Test a known ...
data2 <- c(1,1,1,1,1,2,3,3,3,3,3) # median clearly 2
median(data2) #yields 2, yah... 

#using weights... median should be 2 if function working right
data3 <- data.frame(Student=c(LETTERS[1:3]), Height.cm=c(1,2,3), Prob.in.class=c(5/12,2/12,5/12))
reldist::wtd.quantile(data3$Height.cm, q = .5, 
                  weight = data3$Prob.in.class) # yields 3, not the right answer

#the wtd.quantile function does not like probabilities. 
#multiply the weights to something greater than 1 seems to work. 
reldist::wtd.quantile(data3$Height.cm, q = .5, weight = data3$Prob.in.class*100) # yields 2, the right answer

当值具有不同的采样概率时，计算中位数的最快方法是什么？

1 个答案: