所以我有两列数据,我bin,binning后,我将一列的分箱数据拆分,然后从中生成一个频率表。所以使用以下示例: -
column1 <- as.numeric(c("100.01", "100.015", "100.017", "100.071", "100.099", "100.111", "100.153", "100.167"))
column2 <- as.numeric(c("0.89", "0.64", "-0.14", "-0.79", "1", "0.31", "-0.27", "0.45"))
test <- cbind(column1, column2)
bin1 <- seq(100, 100.2, by = 0.05)
bin2 <- seq(-1, 1, by = 0.5)
res <- data.frame(Map(function(x,y) cut(x, breaks=y),
as.data.frame(test), list(bin1, bin2)))
res1 <- cbind(test, res)
res1
column1 column2 column1 column2
1 100.010 0.89 (100,100.05] (0.5,1]
2 100.015 0.64 (100,100.05] (0.5,1]
3 100.017 -0.14 (100,100.05] (-0.5,0]
4 100.071 -0.79 (100.05,100.1] (-1,-0.5]
5 100.099 1.00 (100.05,100.1] (0.5,1]
6 100.111 0.31 (100.1,100.15] (0,0.5]
7 100.153 -0.27 (100.15,100.2] (-0.5,0]
8 100.167 0.45 (100.15,100.2] (0,0.5]
我希望按列1拆分第2列分箱数据,然后从第1列中确定构成第1列的容器的值的中位数,所以它看起来像这样: -
Freq <- do.call(rbind, lapply(split(res1[,4], res1[,3]),table))
Freq
(-1,-0.5] (-0.5,0] (0,0.5] (0.5,1]
(100,100.05] 0 1 0 2
(100.05,100.1] 1 0 0 1
(100.1,100.15] 0 0 1 0
(100.15,100.2] 0 1 1 0
由此我希望能够查看落入每个配对的值,所以如果我想查看(100,100.05)和(0.5,1)(其中包含两个值的值)的值,那么我想要一种方法来检索落入给定bin的第一列值并计算平均值。所以使用上面的例子,如果我想查看bin中的所有值(0.5,1),那么我想要输出: -
(0.5,1]
(100,100.05] 100.0125
(100.05,100.1] 100.099
(100.1,100.15] NA
(100.15,100.2] NA
由于
答案 0 :(得分:1)
你可以尝试
res1 <- data.frame(test, res)
library(reshape2)
res2 <- dcast(res1, column1.1~column2.1, value.var='column1', mean)
res2
# column1.1 (-1,-0.5] (-0.5,0] (0,0.5] (0.5,1]
#1 (100,100.05] NaN 100.017 NaN 100.0125
#2 (100.05,100.1] 100.071 NaN NaN 100.0990
#3 (100.1,100.15] NaN NaN 100.111 NaN
#4 (100.15,100.2] NaN 100.153 100.167 NaN
如果您需要同时获得&#39; column1&#39;的mean
可以使用来自dcast
的开发版本(即data.table
)的&#39; column2&#39;,v1.9.5
。它可能需要多个value.vars
。安装devel版本的说明是here
library(data.table)#v1.9.5+
dcast(setDT(res1), column1.1~column2.1,
value.var=c('column1', 'column2'), mean)