我想将所选列中的NA替换为列级别中的最后一个值,但它会不断将列转换为字符:
table(sapply(cop2014, class))
factor numeric
400 116
varToCat = c("V21A","A3","Escolari","A17","B8","C5B","RamaEmpPri","C11","C16B",
"C16C","D4B","D4C","RamaEmpSec","RamaUltEmpCesant","G12",
"RamaFuerzaTrab","OcupFuerzaTrab","ActNoMer")
cop2014[,varToCat] = sapply(cop2014[,varToCat],
function(col) replace(col, is.na(col), last(levels(col))))
当我看看变量的类时,我可以看到它们发生了变化。
table(sapply(cop2014, class))
character factor numeric
18 382 116
有关为何发生这种情况的任何提示?我只想用有效因子替换NA(在这种情况下是级别上的最后一个)
答案 0 :(得分:1)
转化为matrix
并sapply
,而matrix
只能容纳一个类。因此,请使用sapply
lapply
df1[] <- lapply(df1, function(x) replace(x, is.na(x), last(levels(x))))
str(df1)
#'data.frame': 10 obs. of 2 variables:
#$ v1: Factor w/ 3 levels "B","D","E": 1 1 3 2 2 3 1 3 3 1
#$ v2: Factor w/ 5 levels "A","B","C","D",..: 4 3 5 5 2 5 2 1 4 1
如果我们查看sapply
的输出,它是matrix
,它只能容纳一个类。在转换为matrix
期间,factor
的属性将丢失,并转换为character
sapply(df1, function(x) replace(x, is.na(x), last(levels(x))))
# v1 v2
# [1,] "B" "D"
# [2,] "B" "C"
# [3,] "E" "E"
# [4,] "D" "E"
# [5,] "D" "B"
# [6,] "E" "E"
# [7,] "B" "B"
# [8,] "E" "A"
# [9,] "E" "D"
#[10,] "B" "A"
除了lapply
之外,我们还可以使用mutate_at
中的tidyverse
library(dplyr)
cop2014 %>%
mutate_at(vars(varToCat), funs(replace(., is.na(.), last(levels(.)))))
f1 <- function(n) sample(c(LETTERS[1:5], NA), n, replace = TRUE)
set.seed(24)
df1 <- data.frame(v1 = f1(10), v2 = f1(10))