我有这个数据框:
> set.seed(100)
> df <- data.frame(Age = sample(18:70, 30, replace = TRUE),
Sex = sample(0:1, 30, replace =TRUE),
stringsAsFactors = FALSE)
> df
Age Sex
1 34 0
2 31 1
3 47 0
4 20 1
5 42 1
6 43 1
7 61 0
8 37 1
9 46 1
10 27 0
11 51 0
12 64 1
13 32 1
14 39 1
15 58 1
16 53 0
17 28 1
18 36 1
19 37 0
20 54 0
21 46 0
22 55 0
23 46 0
24 57 0
25 40 1
26 27 0
27 58 0
28 64 0
29 47 1
30 32 0
我想为该年龄段的每个年龄段创建另一个列:
> df_range <- data.frame(Age_Range = c("Lower than 26", "26 to 30", "31 to 35", "36 to 40", "41 to 45", "46 to 50", "51 to 55", "56 to 60", "61 to 65", "More than 65"),
stringsAsFactors = FALSE)
> df_range
Age_Range
1 Lower than 26
2 26 to 30
3 31 to 35
4 36 to 40
5 41 to 45
6 46 to 50
7 51 to 55
8 56 to 60
9 61 to 65
10 More than 65
我知道我可以通过创建一个巨大的表来做到这一点,其中第一列具有所有可能的年龄(例如-1000到1000,以避免出现异常值),第二列则列出了每个年龄的范围,我也可以做到每个范围的ifelse()
。但是,没有更有效的方法吗?也许像excel中的TRUE VLOOKUP一样?
答案 0 :(得分:1)
要在均匀范围内切割,可以使用
df$age_range = cut_width(df$Age,width=5,boundary=0)
答案 1 :(得分:0)
使用上述@RLave所说的findInterval
(不是set.seed
):您应查看left.open
和right.open
并根据需要进行调整。
df$Agegroup<-findInterval(df$Age,c(0,26,30,35,40,45,50,55,60,65,200))
library(dplyr)
df<-df %>%
rename(Age_range=Agegroup) %>%
mutate(Age_range=as.factor(Age_range))
levels(df$Age_range)<-c("Lower than 26","26 to 30","31 to 35","36 to 40",
"41 to 45","46 to 50","51 to 55","56 to 60","61 to 65",
"More than 65")#Kinda tiring
输出(被截断)
Age Sex Age_range
1 29 1 26 to 30
2 52 0 51 to 55
3 38 1 36 to 40
4 56 0 56 to 60
5 33 1 31 to 35
6 20 1 Lower than 26
7 40 1 41 to 45
8 31 0 31 to 35
9 43 0 41 to 45
10 50 0 51 to 55