假设我有0到20之间的数字数据集 我想创建3个不同的年龄组 0~9岁,10~15岁,16~20岁
如何将3个因子分配给0到20之间的一组数字 对应于他们的特定值?
因此,例如,0到9之间的值将被指定为“0〜9岁”因子 并且10到15将分配“10~15岁”因子,等等
我如何在R?
中执行此操作答案 0 :(得分:0)
case_when
函数可以解决问题。请尝试以下方法:
library(tidyverse)
df <- tibble(age = 1:20)
df %>%
mutate(age_categories = case_when(age <= 9 ~ "0~9 years old",
age <= 15 & age > 9 ~ "10~15 years old",
age <= 20 & age > 15 ~ "16~20 years old",
TRUE ~ "Other"))
返回:
# A tibble: 20 x 2
age age_categories
<int> <chr>
1 1 0~9 years old
2 2 0~9 years old
3 3 0~9 years old
4 4 0~9 years old
5 5 0~9 years old
6 6 0~9 years old
7 7 0~9 years old
8 8 0~9 years old
9 9 0~9 years old
10 10 10~15 years old
11 11 10~15 years old
12 12 10~15 years old
13 13 10~15 years old
14 14 10~15 years old
15 15 10~15 years old
16 16 16~20 years old
17 17 16~20 years old
18 18 16~20 years old
19 19 16~20 years old
20 20 16~20 years old
或者,您可以执行以下操作:
df$age_categories <- factor(df$age)
levels(df$age_categories) <- list(
"0~9 years old" = 1:9,
"10~15 years old" = 10:15,
"16~20 years old" = 16:20
)
答案 1 :(得分:0)
使用base::cut
(R)/ pandas.cut
(Python)?
df <- data.frame(age = 0:20)
labels = sprintf("from %s yrs old", c("0~9","10~15","16~20")
df$groups <- cut(
df$age,
breaks=c(0,9,15,20),
include.lowest = T,
labels = labels)
)
df$groups
# [1] from 0~9 yrs old from 0~9 yrs old from 0~9 yrs old from 0~9 yrs old from 0~9 yrs old from 0~9 yrs old
# [7] from 0~9 yrs old from 0~9 yrs old from 0~9 yrs old from 0~9 yrs old from 10~15 yrs old from 10~15 yrs old
# [13] from 10~15 yrs old from 10~15 yrs old from 10~15 yrs old from 10~15 yrs old from 16~20 yrs old from 16~20 yrs old
# [19] from 16~20 yrs old from 16~20 yrs old from 16~20 yrs old
# Levels: from 0~9 yrs old from 10~15 yrs old from 16~20 yrs old
和
import pandas as pd
df = pd.DataFrame({'age':range(20)})
labels = ['from %s yrs old' % x for x in ['0~9','10~15','16~20']]
df.groups = pd.cut(
df.age,
bins = [0,9,15,20],
include_lowest=True, labels = labels)
df.groups
#0 from 0~9 yrs old
#1 from 0~9 yrs old
#2 from 0~9 yrs old
#3 from 0~9 yrs old
#4 from 0~9 yrs old
#5 from 0~9 yrs old
#6 from 0~9 yrs old
#7 from 0~9 yrs old
#8 from 0~9 yrs old
#9 from 0~9 yrs old
#10 from 10~15 yrs old
#11 from 10~15 yrs old
#12 from 10~15 yrs old
#13 from 10~15 yrs old
#14 from 10~15 yrs old
#15 from 10~15 yrs old
#16 from 16~20 yrs old
#17 from 16~20 yrs old
#18 from 16~20 yrs old
#19 from 16~20 yrs old
#Name: age, dtype: category
#Categories (3, object): [from 0~9 yrs old < from 10~15 yrs old < from 16~20 yrs old]