以因子的形式查看组成分箱数据的值

时间:2015-04-20 16:15:42

标签: r

所以我有两列数据,我bin,binning后,我将一列的分箱数据拆分,然后从中生成一个频率表。所以使用以下示例: -

column1 <- as.numeric(c("100.01", "100.015", "100.017", "100.071", "100.099", "100.111", "100.153", "100.167"))
column2 <- as.numeric(c("0.89", "0.64", "-0.14", "-0.79", "1", "0.31", "-0.27", "0.45"))
test <- cbind(column1, column2)
bin1 <- seq(100, 100.2, by = 0.05)
bin2 <- seq(-1, 1, by = 0.5)
res <- data.frame(Map(function(x,y) cut(x, breaks=y),
     as.data.frame(test), list(bin1, bin2)))

res1 <- cbind(test, res)
res1
  column1 column2        column1   column2
1 100.010    0.89   (100,100.05]   (0.5,1]
2 100.015    0.64   (100,100.05]   (0.5,1]
3 100.017   -0.14   (100,100.05]  (-0.5,0]
4 100.071   -0.79 (100.05,100.1] (-1,-0.5]
5 100.099    1.00 (100.05,100.1]   (0.5,1]
6 100.111    0.31 (100.1,100.15]   (0,0.5]
7 100.153   -0.27 (100.15,100.2]  (-0.5,0]
8 100.167    0.45 (100.15,100.2]   (0,0.5]

我希望按列1拆分第2列分箱数据,然后从第1列中确定构成第1列的容器的值的中位数,所以它看起来像这样: -

Freq <- do.call(rbind, lapply(split(res1[,4], res1[,3]),table))
Freq
               (-1,-0.5] (-0.5,0] (0,0.5] (0.5,1]
(100,100.05]           0        1       0       2
(100.05,100.1]         1        0       0       1
(100.1,100.15]         0        0       1       0
(100.15,100.2]         0        1       1       0

由此我希望能够查看落入每个配对的值,所以如果我想查看(100,100.05)和(0.5,1)(其中包含两个值的值)的值,那么我想要一种方法来检索落入给定bin的第一列值并计算平均值。所以使用上面的例子,如果我想查看bin中的所有值(0.5,1),那么我想要输出: -

                (0.5,1]
(100,100.05]     100.0125
(100.05,100.1]   100.099
(100.1,100.15]   NA
(100.15,100.2]   NA 

由于

1 个答案:

答案 0 :(得分:1)

你可以尝试

res1 <- data.frame(test, res)
library(reshape2)
res2 <- dcast(res1, column1.1~column2.1, value.var='column1', mean)
res2
#      column1.1 (-1,-0.5] (-0.5,0] (0,0.5]  (0.5,1]
#1   (100,100.05]       NaN  100.017     NaN 100.0125
#2 (100.05,100.1]   100.071      NaN     NaN 100.0990
#3 (100.1,100.15]       NaN      NaN 100.111      NaN
#4 (100.15,100.2]       NaN  100.153 100.167      NaN

如果您需要同时获得&#39; column1&#39;的mean可以使用来自dcast的开发版本(即data.table)的&#39; column2&#39;,v1.9.5。它可能需要多个value.vars。安装devel版本的说明是here

library(data.table)#v1.9.5+
dcast(setDT(res1), column1.1~column2.1, 
         value.var=c('column1', 'column2'), mean)