考虑一下,根据参数(3,5)的Gamma分布,我有100万个观测值。我可以使用summary()
来找到分位数,但是我试图找出每个红线之间有多少个观测值,这些观测值被分为10条?
a = rgamma(1e6, shape = 3, rate = 5)
summary(a)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0053 0.3455 0.5351 0.6002 0.7845 4.4458
答案 0 :(得分:3)
我们可以将cut
与table
一起使用:
table(cut(a, quantile(a, 0:10 / 10)))
# (0.00202,0.22] (0.22,0.307] (0.307,0.382] (0.382,0.457] (0.457,0.535] (0.535,0.622]
# 99999 100000 100000 100000 100000 100000
# (0.622,0.724] (0.724,0.856] (0.856,1.07] (1.07,3.81]
# 100000 100000 100000 100000
但是考虑到分位数是多少,这可能不是很有趣。也许您可能也想尝试理论分位数:
table(cut(a, qgamma(0:10 / 10, 3, 5)))
# (0,0.22] (0.22,0.307] (0.307,0.383] (0.383,0.457] (0.457,0.535] (0.535,0.621] (0.621,0.723]
# 99978 100114 100545 99843 99273 99644 100104
# (0.723,0.856] (0.856,1.06] (1.06,Inf]
# 100208 99883 100408
没什么有趣的,因为,如果您的数据确实遵循伽马分布并且您有大量观察结果,那么您可以确定在第q次与(q + x)-理论分位数。在较小的样本中,第二种方法可能很有趣。
编辑:针对您的最新问题,很明显,有10%,20%的意思不是分位数。假设最小值为0,最大值为2,如果占10%,您认为0.2,那么您想要
table(cut(a, seq(min(a), max(a), length = 10 + 1)))
# (0.00418,0.428] (0.428,0.853] (0.853,1.28] (1.28,1.7] (1.7,2.13] (2.13,2.55]
# 361734 436176 155332 37489 7651 1335
# (2.55,2.97] (2.97,3.4] (3.4,3.82] (3.82,4.25]
# 231 38 11 2