对于每个组,值的计数小于在每列上执行的整个列的分位数

时间:2016-09-23 14:41:37

标签: r quantile binning

我有一个data.frame有两个组和两个变量(组变量除外),例如:

set.seed(1729)
temp <- data.frame(group=c(1,2),value1=rnorm(12),value2=rnorm(12))
temp = temp[order(temp$group),]
# group      value1      value2
#     1  0.21531616  0.08679615
#     1  1.08604925  0.36344973
#     1  1.04225410  0.53281840
#     1 -1.40843189  0.52096971
#     1 -0.07130541 -0.47550518
#     1  0.18839979  1.96241245
#     2  0.18374784 -0.64102941
#     2  0.02871298 -0.67746579
#     2  0.08826553 -0.32679060
#     2  0.05522136 -0.31371224
#     2  0.36086719 -0.10004339
#     2 -0.55618926  1.22760816

我计算了temp $ value1&amp;的分位数。温度$ VALUE2

qt1 = quantile(temp$value1,probs = c(0.25,0.5,0.75))
qt2 = quantile(temp$value2,probs = c(0.25,0.5,0.75))

对于每个小组,我需要 (1)temp $ value1的值的数量&lt; qt1 [1](2)temp $ value1的值的数量&lt; qt [2](3)temp $ value1的值的数量&lt; QT1 [3]。同样,temp $ value2的计数有六个值(两组和三个分位数)。

作为代码,它是(我手动将每个组+变量复制到一个向量中以说明我想要的东西)

g1v1=c(0.21531616,1.08604925,1.04225410,-1.40843189,-0.07130541,0.18839979)
length(g1v1[g1v1<qt1[1]])
length(g1v1[g1v1<qt1[2]])
length(g1v1[g1v1<qt1[3]])

g2v1=c(0.18374784,0.02871298,0.08826553,0.05522136,0.36086719,-0.55618926)
length(g1v2[g1v2<qt1[1]])
length(g1v2[g1v2<qt1[2]])
length(g1v2[g1v2<qt1[3]])

g1v2=c(0.08679615,0.36344973,0.53281840,0.52096971,-0.47550518,1.96241245)
length(g2v1[g2v1<qt2[1]])
length(g2v1[g2v1<qt2[2]])
length(g2v1[g2v1<qt2[3]])
#similarly for g2v2

输出必须是data.frame,如:

# group value1.25.ct value1.50.ct value1.75.ct value2.25.ct value2.50.ct value2.75.ct
#     1            2           2           4             1            1        4
#     2            1           4           5             2            5        5

请推荐一种有效的方法。谢谢。

0 个答案:

没有答案