我有一个逻辑类型列Self_Employed,值为TRUE和FALSE,它缺少值,表示说“ An Employee”不是自雇人士。我想在列中插入“缺少”类别
class(df$Self_Employed)
[1] "logical"
levels(df$Self_Employed)
NULL
sum(is.na(df$Self_Employed))
[1] 210
table(df$Self_Employed)
FALSE TRUE
1561 271
将类设为“逻辑”,级别为NULL,缺失总和为210,表显示的是对与否。
估算丢失 首先,我转换为因数,然后估算出缺失值,但未填满,仅显示NA且水平仅表示TRUE和FALSE
df$Self_Employed <- as.factor(df$Self_Employed)
levels(df$Self_Employed)[levels(df$Self_Employed)=="" ] <- "SE_Missing"
levels(df$Self_Employed)
[1] "FALSE" "TRUE"
仅显示True和False的级别,而is.na则显示相同的210
df$Self_Employed <- factor(df$Self_Employed,levels=c('FALSE','TRUE',''),labels=c('Yes','No','SE_Missing'))
如何填补缺失的因素
我需要将True转换为“是”,将False转换为“ No”,将NA转换为“ SE_Missing”
答案 0 :(得分:2)
我认为您无需将因素归因于此。这是一个使用虚拟数据集的示例
library(dplyr)
df %>%
mutate(b = case_when(b ~ "Yes",
!b ~ "No",
TRUE ~ "SE_Missing"))
# a b
#1 1 Yes
#2 2 Yes
#3 3 No
#4 4 SE_Missing
#5 5 No
#6 6 SE_Missing
或者使用嵌套的ifelse
,也可以将其集成到mutate
中
with(df, ifelse(is.na(b), "SE_Missing", ifelse(b, "Yes", "No")))
#[1] "Yes" "Yes" "No" "SE_Missing" "No" "SE_Missing"
数据
df <- data.frame(a = 1:6, b = c(TRUE, TRUE, FALSE, NA, FALSE, NA))
# a b
#1 1 TRUE
#2 2 TRUE
#3 3 FALSE
#4 4 NA
#5 5 FALSE
#6 6 NA