如何计算数据集的第99个百分位数

时间:2020-02-19 15:32:52

标签: r

我有一个两个样本量的权重平均值的数据集,我有100,000个测试,我试图找出第99个百分位数,但我不知道该怎么做,我通过这样做找出了四分位数的中位数以下;

summary(Lifts)
 Large           Small      
 Min.   : 62.5   Min.   : 54.2  
 1st Qu.: 99.1   1st Qu.: 96.0  
 Median :106.0   Median :106.0  
 Mean   :106.0   Mean   :106.0  
 3rd Qu.:112.9   3rd Qu.:116.0  
 Max.   :147.5   Max.   :156.8 

我已经尝试过使用四分位命令来找到大小百分比的第99个百分位数;

quantile(Lifts, probs = c(0, 0.25, 0.50, 0.99))
Error in `[.data.frame`(x, order(x, na.last = na.last, decreasing = decreasing)) : 
  undefined columns selected

但是我收到那个错误

任何帮助将不胜感激

2 个答案:

答案 0 :(得分:3)

如果我们指定列(例如,使用$表示法),我们将摆脱错误:

quantile(Lifts$Large, probs = c(0, 0.25, 0.50, 0.99))

quantile(Lifts$Small, probs = c(0, 0.25, 0.50, 0.99))

答案 1 :(得分:2)

通常,要将功能应用于数据框的所有列,我们可以使用lapply,它也可以与quantile一起使用。

lapply(lifts, quantile, probs=c(0, 0.25, 0.50, 0.99))
# $large
#       0%     25%     50%     99% 
#   14.400 161.675 488.450 950.506 
# 
# $small
#      0%    25%    50%    99% 
#   0.900 30.800 43.650 97.744 

我们也可以使用sapply来做同样的事情,但是其输出与输出矩阵相同。

sapply(lifts, quantile, probs=c(0, 0.25, 0.50, 0.99))
#       large  small
# 0%   14.400  0.900
# 25% 161.675 30.800
# 50% 488.450 43.650
# 99% 950.506 97.744

数据

lifts <- structure(list(large = c(489.9, 734.5, 905.6, 41.9, 950.2, 73.9, 
444.7, 950.8, 303.9, 539, 399.4, 429.5, 670.2, 39.1, 324.6, 829.6, 
97.9, 216.6, 500.1, 364.4, 762.6, 205.7, 191.6, 128.6, 749.2, 
185, 736.9, 46.9, 114.2, 774.4, 626.5, 42.5, 52.5, 724.3, 518.3, 
932.7, 602.5, 14.4, 794.9, 149.7, 621.6, 674.2, 685.1, 153.9, 
42.3, 487, 787.5, 351.6, 689.3, 862.3), small = c(56.5, 63.6, 
49.5, 76.7, 78, 25.8, 57.8, 19.2, 27.7, 38.3, 36.4, 4.4, 89.2, 
68.8, 36.1, 71.8, 69.1, 35.8, 38.2, 26.9, 95.5, 30.7, 43.2, 58.8, 
44.1, 35.4, 91.2, 37.1, 99.9, 94.5, 52, 38.2, 40.1, 50.9, 81.7, 
7.5, 77.5, 71.9, 70.6, 8.2, 90.1, 31.1, 3.4, 52, 0.9, 30.5, 12.7, 
45.6, 34.2, 13.4)), class = "data.frame", row.names = c(NA, -50L
))