如何为R中的多个值分配唯一因子?

时间:2017-11-20 11:04:30

标签: r

假设我有0到20之间的数字数据集 我想创建3个不同的年龄组 0~9岁,10~15岁,16~20岁

如何将3个因子分配给0到20之间的一组数字 对应于他们的特定值?

因此,例如,0到9之间的值将被指定为“0〜9岁”因子 并且10到15将分配“10~15岁”因子,等等

我如何在R?

中执行此操作

2 个答案:

答案 0 :(得分:0)

case_when函数可以解决问题。请尝试以下方法:

library(tidyverse)

df <- tibble(age = 1:20)

df %>% 
  mutate(age_categories = case_when(age <= 9 ~ "0~9 years old",
                                    age <= 15 & age > 9 ~ "10~15 years old",
                                    age <= 20 & age > 15 ~ "16~20 years old",
                                    TRUE ~ "Other"))

返回:

# A tibble: 20 x 2
     age  age_categories
   <int>           <chr>
 1     1   0~9 years old
 2     2   0~9 years old
 3     3   0~9 years old
 4     4   0~9 years old
 5     5   0~9 years old
 6     6   0~9 years old
 7     7   0~9 years old
 8     8   0~9 years old
 9     9   0~9 years old
10    10 10~15 years old
11    11 10~15 years old
12    12 10~15 years old
13    13 10~15 years old
14    14 10~15 years old
15    15 10~15 years old
16    16 16~20 years old
17    17 16~20 years old
18    18 16~20 years old
19    19 16~20 years old
20    20 16~20 years old

或者,您可以执行以下操作:

df$age_categories <- factor(df$age)

levels(df$age_categories) <- list(
  "0~9 years old" = 1:9,
  "10~15 years old" = 10:15,
  "16~20 years old" = 16:20
)

答案 1 :(得分:0)

使用base::cut(R)/ pandas.cut(Python)?

df <- data.frame(age = 0:20)
labels = sprintf("from %s yrs old", c("0~9","10~15","16~20")
df$groups <- cut(
  df$age, 
  breaks=c(0,9,15,20), 
  include.lowest = T, 
  labels = labels)
)
df$groups
# [1] from 0~9 yrs old   from 0~9 yrs old   from 0~9 yrs old   from 0~9 yrs old   from 0~9 yrs old   from 0~9 yrs old  
# [7] from 0~9 yrs old   from 0~9 yrs old   from 0~9 yrs old   from 0~9 yrs old   from 10~15 yrs old from 10~15 yrs old
# [13] from 10~15 yrs old from 10~15 yrs old from 10~15 yrs old from 10~15 yrs old from 16~20 yrs old from 16~20 yrs old
# [19] from 16~20 yrs old from 16~20 yrs old from 16~20 yrs old
# Levels: from 0~9 yrs old from 10~15 yrs old from 16~20 yrs old

import pandas as pd
df = pd.DataFrame({'age':range(20)})
labels = ['from %s yrs old' % x for x in ['0~9','10~15','16~20']]
df.groups = pd.cut(
  df.age,
  bins = [0,9,15,20],
  include_lowest=True, labels = labels)
df.groups
#0       from 0~9 yrs old
#1       from 0~9 yrs old
#2       from 0~9 yrs old
#3       from 0~9 yrs old
#4       from 0~9 yrs old
#5       from 0~9 yrs old
#6       from 0~9 yrs old
#7       from 0~9 yrs old
#8       from 0~9 yrs old
#9       from 0~9 yrs old
#10    from 10~15 yrs old
#11    from 10~15 yrs old
#12    from 10~15 yrs old
#13    from 10~15 yrs old
#14    from 10~15 yrs old
#15    from 10~15 yrs old
#16    from 16~20 yrs old
#17    from 16~20 yrs old
#18    from 16~20 yrs old
#19    from 16~20 yrs old
#Name: age, dtype: category
#Categories (3, object): [from 0~9 yrs old < from 10~15 yrs old < from 16~20 yrs old]