如何在R中将性别列的值更改为数值?

时间:2019-10-17 19:52:32

标签: r data.table

我有一个带有“性别”列的数据框,但不幸的是,“性别”列中填充了自由文本,例如:male, female, m, f, Male, Female, Demiguy, none, Trans, Cisgender, non-binary, She/her/they/them, Other, Cis, SWM, NB, Genderfluid, Nonbinary/femme等。

我想将这些值更正为male=0female=1other=2

我尝试了几种data.table方法,但无法弄清楚。

2 个答案:

答案 0 :(得分:1)

您可能必须做这样的事情。自由文本是一种痛苦。

library(dplyr)
male_terms <- c("Male","male","man","Man","m")
female_terms <- c("Female","female","woman","Woman","f")
x <- x %>%
    mutate(gender_bin = if_else(gender %in% male_terms,0,
                          if_else(gender %in% female_terms,1,2)))

答案 1 :(得分:0)

也许您可以尝试进行调整:

DT[, GENDER := 2]
DT[toupper(X) %chin% c("M","MAN","BOY") | grepl("male", X, ignore.case=TRUE), GENDER := 0]
DT[toupper(X) %chin% c("F","WOMAN","GIRL") | grepl("female", X, ignore.case=TRUE), GENDER :=1]

数据:

library(data.table)
DT <- data.table(X=c("Malel","male","female","m","f","Male","Female","Demiguy",
    "none","Trans","Cisgender","non-binary","She/her/they/them","Other","Cis",
    "SWM","NB","Genderfluid","Nonbinary/femme"))