R代码显示存储在每个bin中的实际连续值?

时间:2015-07-24 23:49:46

标签: r categories cut continuous binning

举个简单的例子, to" bin" 1000(连续值)数据点 在10个箱子(类别)中, 每个箱子中有100个数据点:

x <- rnorm(1000, mean=0, sd=50)

# Next, let's say we want to create ten bins 
# with equal number of observations (100), in each bin:
bins <- 10
cutpoints <- quantile(x,(0:bins)/bins)

# The cutpoints variable 
# holds a vector of the cutpoints used to bin the data.   

# Finally we perform the binning to form the categories variable:

 binned <- cut(x,cutpoints,include.lowest=TRUE)
 summary(binned)
   [-152,-61]     (-61,-40]   (-40,-23.9] 
          100           100           100 
(-23.9,-10.2]  (-10.2,2.86]   (2.86,15.4] 
          100           100           100 
  (15.4,25.9]   (25.9,44.1]   (44.1,64.7] 
          100           100           100 
   (64.7,186] 
          100 

如你所见, 最后的摘要代码给你 每个bin中的x值的数量, (即:100行值)。

我的问:
如何显示实际的100 x值 在每个bin中加上它的x行#(或rowname)??

什么是实际的R代码
得到一个3列数据框,(cols:Bin,Rowname和Values) 结构如下?:

       Bin Rowname  Values
[-152,-61]  [25] -78.2  
            [28] -82.1  
            [75] -99.7 etc.....  

(-61,-40]    [18]-45.0  
             [26]-68.4 etc....  

谢谢!

1 个答案:

答案 0 :(得分:3)

除了将其包装成data.frame

之外,您已经完成了所需的一切
head(data.frame(Values=x, Bin=binned, Rowname=seq_along(x))[order(binned), ])
#       Values          Bin Rowname
# 2  -66.88718 [-189,-64.7]       2
# 5  -99.08521 [-189,-64.7]       5
# 8  -95.06063 [-189,-64.7]       8
# 10 -95.04592 [-189,-64.7]      10
# 15 -78.48819 [-189,-64.7]      15
# 28 -78.49396 [-189,-64.7]      28

您不需要列的rownames,因为data.frame保留了rowname属性,即rownames(yourData)