我有一个带有“性别”列的数据框,但不幸的是,“性别”列中填充了自由文本,例如:male, female, m, f, Male, Female, Demiguy, none, Trans, Cisgender, non-binary, She/her/they/them, Other, Cis, SWM, NB, Genderfluid, Nonbinary/femme
等。
我想将这些值更正为male=0
,female=1
和other=2
。
我尝试了几种data.table
方法,但无法弄清楚。
答案 0 :(得分:1)
您可能必须做这样的事情。自由文本是一种痛苦。
library(dplyr)
male_terms <- c("Male","male","man","Man","m")
female_terms <- c("Female","female","woman","Woman","f")
x <- x %>%
mutate(gender_bin = if_else(gender %in% male_terms,0,
if_else(gender %in% female_terms,1,2)))
答案 1 :(得分:0)
也许您可以尝试进行调整:
DT[, GENDER := 2]
DT[toupper(X) %chin% c("M","MAN","BOY") | grepl("male", X, ignore.case=TRUE), GENDER := 0]
DT[toupper(X) %chin% c("F","WOMAN","GIRL") | grepl("female", X, ignore.case=TRUE), GENDER :=1]
数据:
library(data.table)
DT <- data.table(X=c("Malel","male","female","m","f","Male","Female","Demiguy",
"none","Trans","Cisgender","non-binary","She/her/they/them","Other","Cis",
"SWM","NB","Genderfluid","Nonbinary/femme"))