创建年龄段-R

时间:2019-03-07 12:13:49

标签: r dplyr data-manipulation

我有这个数据框:

> set.seed(100)
> df <- data.frame(Age = sample(18:70, 30, replace = TRUE),
                 Sex = sample(0:1, 30, replace =TRUE),
                 stringsAsFactors = FALSE)
> df
   Age Sex
1   34   0
2   31   1
3   47   0
4   20   1
5   42   1
6   43   1
7   61   0
8   37   1
9   46   1
10  27   0
11  51   0
12  64   1
13  32   1
14  39   1
15  58   1
16  53   0
17  28   1
18  36   1
19  37   0
20  54   0
21  46   0
22  55   0
23  46   0
24  57   0
25  40   1
26  27   0
27  58   0
28  64   0
29  47   1
30  32   0

我想为该年龄段的每个年龄段创建另一个列:

> df_range <- data.frame(Age_Range = c("Lower than 26", "26 to 30", "31 to 35", "36 to 40", "41 to 45", "46 to 50", "51 to 55", "56 to 60", "61 to 65", "More than 65"),
                 stringsAsFactors = FALSE)
> df_range
       Age_Range
1  Lower than 26
2       26 to 30
3       31 to 35
4       36 to 40
5       41 to 45
6       46 to 50
7       51 to 55
8       56 to 60
9       61 to 65
10  More than 65

我知道我可以通过创建一个巨大的表来做到这一点,其中第一列具有所有可能的年龄(例如-1000到1000,以避免出现异常值),第二列则列出了每个年龄的范围,我也可以做到每个范围的ifelse()。但是,没有更有效的方法吗?也许像excel中的TRUE VLOOKUP一样?

2 个答案:

答案 0 :(得分:1)

要在均匀范围内切割,可以使用
df$age_range = cut_width(df$Age,width=5,boundary=0)

答案 1 :(得分:0)

使用上述@RLave所说的findInterval(不是set.seed):您应查看left.openright.open并根据需要进行调整。

 df$Agegroup<-findInterval(df$Age,c(0,26,30,35,40,45,50,55,60,65,200))
    library(dplyr)
    df<-df %>% 
      rename(Age_range=Agegroup) %>% 
      mutate(Age_range=as.factor(Age_range))
    levels(df$Age_range)<-c("Lower than 26","26 to 30","31 to 35","36 to 40",
                            "41 to 45","46 to 50","51 to 55","56 to 60","61 to 65",
                            "More than 65")#Kinda tiring

输出(被截断)

   Age Sex     Age_range
1   29   1      26 to 30
2   52   0      51 to 55
3   38   1      36 to 40
4   56   0      56 to 60
5   33   1      31 to 35
6   20   1 Lower than 26
7   40   1      41 to 45
8   31   0      31 to 35
9   43   0      41 to 45
10  50   0      51 to 55