在R Bins中,数据帧根据固定的间隔大小,并且相应地出现具有较低和较高的箱值的值

时间:2017-01-02 11:25:04

标签: r dplyr

我正在关注dt数据框。

dt <- data.frame(
  No= c(14000,17000,48452,94632,36541,20000,100000,46241,78941,32464,69872,90000)) 

预期输出应为

       No             bin lower  upper freq
1   14000 [1e+04,1.5e+04] 10000  15000    1
2   17000 (1.5e+04,2e+04] 15000  20000    2
3   20000 (1.5e+04,2e+04] 15000  20000    2
4   32464 (3e+04,3.5e+04] 30000  35000    1
5   36541 (3.5e+04,4e+04] 35000  40000    1
6   46241 (4.5e+04,5e+04] 45000  50000    2
7   48452 (4.5e+04,5e+04] 45000  50000    2
8   69872 (6.5e+04,7e+04] 65000  70000    1
9   78941 (7.5e+04,8e+04] 75000  80000    1
10  90000 (8.5e+04,9e+04] 85000  90000    1
11  94632 (9e+04,9.5e+04] 90000  95000    1
12 100000 (9.5e+04,1e+05] 95000 100000    1

1 个答案:

答案 0 :(得分:0)

这是dplyr解决方案

library(magrittr)
library(dplyr)
seq.no <- seq(10000, 100000, by=5000)
dt <- data.frame(
  No= c(14000,17000,48452,94632,36541,20000,100000,46241,78941,32464,69872,90000)) 

dt <- dt %>% arrange(No) %>% 
  mutate(
    bin = cut(No, breaks=seq.no, include.lowest=TRUE), 
    lower = seq.no[findInterval(No, seq.no, left.open=TRUE)],
    upper = seq.no[findInterval(No, seq.no, left.open=TRUE)+1])

dt.freq <- group_by(dt, bin) %>% summarize(freq=n())
dt %<>% left_join(dt.freq)

修改 对于早期版本的R,其中findInterval没有open.left参数,请在mutate步骤中使用以下内容。

dt <- dt %>% arrange(No) %>% 
  mutate(
    bin = cut(No, breaks=seq.no, include.lowest=TRUE), 
    lower = seq.no[length(seq.no)- findInterval(-No, -rev(seq.no))],
    upper = seq.no[(length(seq.no)- findInterval(-No, -rev(seq.no))) + 1]
)