我有一个两个样本量的权重平均值的数据集,我有100,000个测试,我试图找出第99个百分位数,但我不知道该怎么做,我通过这样做找出了四分位数的中位数以下;
summary(Lifts)
Large Small
Min. : 62.5 Min. : 54.2
1st Qu.: 99.1 1st Qu.: 96.0
Median :106.0 Median :106.0
Mean :106.0 Mean :106.0
3rd Qu.:112.9 3rd Qu.:116.0
Max. :147.5 Max. :156.8
我已经尝试过使用四分位命令来找到大小百分比的第99个百分位数;
quantile(Lifts, probs = c(0, 0.25, 0.50, 0.99))
Error in `[.data.frame`(x, order(x, na.last = na.last, decreasing = decreasing)) :
undefined columns selected
但是我收到那个错误
任何帮助将不胜感激
答案 0 :(得分:3)
如果我们指定列(例如,使用$
表示法),我们将摆脱错误:
quantile(Lifts$Large, probs = c(0, 0.25, 0.50, 0.99))
或
quantile(Lifts$Small, probs = c(0, 0.25, 0.50, 0.99))
答案 1 :(得分:2)
通常,要将功能应用于数据框的所有列,我们可以使用lapply
,它也可以与quantile
一起使用。
lapply(lifts, quantile, probs=c(0, 0.25, 0.50, 0.99))
# $large
# 0% 25% 50% 99%
# 14.400 161.675 488.450 950.506
#
# $small
# 0% 25% 50% 99%
# 0.900 30.800 43.650 97.744
我们也可以使用sapply
来做同样的事情,但是其输出与输出矩阵相同。
sapply(lifts, quantile, probs=c(0, 0.25, 0.50, 0.99))
# large small
# 0% 14.400 0.900
# 25% 161.675 30.800
# 50% 488.450 43.650
# 99% 950.506 97.744
数据
lifts <- structure(list(large = c(489.9, 734.5, 905.6, 41.9, 950.2, 73.9,
444.7, 950.8, 303.9, 539, 399.4, 429.5, 670.2, 39.1, 324.6, 829.6,
97.9, 216.6, 500.1, 364.4, 762.6, 205.7, 191.6, 128.6, 749.2,
185, 736.9, 46.9, 114.2, 774.4, 626.5, 42.5, 52.5, 724.3, 518.3,
932.7, 602.5, 14.4, 794.9, 149.7, 621.6, 674.2, 685.1, 153.9,
42.3, 487, 787.5, 351.6, 689.3, 862.3), small = c(56.5, 63.6,
49.5, 76.7, 78, 25.8, 57.8, 19.2, 27.7, 38.3, 36.4, 4.4, 89.2,
68.8, 36.1, 71.8, 69.1, 35.8, 38.2, 26.9, 95.5, 30.7, 43.2, 58.8,
44.1, 35.4, 91.2, 37.1, 99.9, 94.5, 52, 38.2, 40.1, 50.9, 81.7,
7.5, 77.5, 71.9, 70.6, 8.2, 90.1, 31.1, 3.4, 52, 0.9, 30.5, 12.7,
45.6, 34.2, 13.4)), class = "data.frame", row.names = c(NA, -50L
))