我正在关注dt
数据框。
dt <- data.frame(
No= c(14000,17000,48452,94632,36541,20000,100000,46241,78941,32464,69872,90000))
预期输出应为
No bin lower upper freq
1 14000 [1e+04,1.5e+04] 10000 15000 1
2 17000 (1.5e+04,2e+04] 15000 20000 2
3 20000 (1.5e+04,2e+04] 15000 20000 2
4 32464 (3e+04,3.5e+04] 30000 35000 1
5 36541 (3.5e+04,4e+04] 35000 40000 1
6 46241 (4.5e+04,5e+04] 45000 50000 2
7 48452 (4.5e+04,5e+04] 45000 50000 2
8 69872 (6.5e+04,7e+04] 65000 70000 1
9 78941 (7.5e+04,8e+04] 75000 80000 1
10 90000 (8.5e+04,9e+04] 85000 90000 1
11 94632 (9e+04,9.5e+04] 90000 95000 1
12 100000 (9.5e+04,1e+05] 95000 100000 1
答案 0 :(得分:0)
这是dplyr
解决方案
library(magrittr)
library(dplyr)
seq.no <- seq(10000, 100000, by=5000)
dt <- data.frame(
No= c(14000,17000,48452,94632,36541,20000,100000,46241,78941,32464,69872,90000))
dt <- dt %>% arrange(No) %>%
mutate(
bin = cut(No, breaks=seq.no, include.lowest=TRUE),
lower = seq.no[findInterval(No, seq.no, left.open=TRUE)],
upper = seq.no[findInterval(No, seq.no, left.open=TRUE)+1])
dt.freq <- group_by(dt, bin) %>% summarize(freq=n())
dt %<>% left_join(dt.freq)
修改强>
对于早期版本的R,其中findInterval
没有open.left
参数,请在mutate
步骤中使用以下内容。
dt <- dt %>% arrange(No) %>%
mutate(
bin = cut(No, breaks=seq.no, include.lowest=TRUE),
lower = seq.no[length(seq.no)- findInterval(-No, -rev(seq.no))],
upper = seq.no[(length(seq.no)- findInterval(-No, -rev(seq.no))) + 1]
)