我想总结一下我的数据集,将变量age
分组为5年龄组,因此我会0 1 2 3 4 5 6...
等0 5 10 15
而不是单身年龄80
我的开放式类别。我可以通过手动分类创建一个新变量来做到这一点,但我相信必须有一个更快的方法!
a <- cbind(age=c(rep(seq(0, 90, by=1), 2)), value=rnorm(182))
有什么想法吗?
答案 0 :(得分:1)
library(dplyr)
a %>% data.frame %>% group_by(age_group = (sapply(age,min,80) %/% 5)*5) %>%
summarize(avg_val = mean(value))
# A tibble: 17 x 2
age_group avg_val
<dbl> <dbl>
1 0 -0.151470805
2 5 0.553619149
3 10 0.198915973
4 15 -0.436646287
5 20 -0.024193193
6 25 0.102671120
7 30 -0.350059839
8 35 0.010762264
9 40 0.339268917
10 45 -0.056448481
11 50 0.002982158
12 55 0.348232262
13 60 -0.364050091
14 65 0.177551510
15 70 -0.178885909
16 75 0.664215782
17 80 -0.376929230
答案 1 :(得分:0)
示例数据
set.seed(1)
df <- data.frame(age=runif(1000)*100,
value=runif(1000))
只需将您的论坛的最大值添加到seq(0,80,5)
,即可获得c(..., max(age))
的不定期中断
library(dplyr)
df %>%
mutate(age = cut(age, breaks=c(seq(0,80,5), max(age)))) %>%
group_by(age) %>%
summarise(value=mean(value))
输出
age value
<fctr> <dbl>
1 (0,5] 0.4901119
2 (5,10] 0.5131055
3 (10,15] 0.5022297
4 (15,20] 0.4712481
5 (20,25] 0.5610872
6 (25,30] 0.4207203
7 (30,35] 0.5218318
8 (35,40] 0.4377102
9 (40,45] 0.5007616
10 (45,50] 0.4941768
11 (50,55] 0.5350272
12 (55,60] 0.5226967
13 (60,65] 0.5031688
14 (65,70] 0.4652641
15 (70,75] 0.5667020
16 (75,80] 0.4664898
17 (80,100] 0.4604779