在数据框贷款中填写NA值的一种常规方法如下:
for (i in 1: ncol(loan))
{
if (is.character(loan[,i]))
{
loan[is.na(loan[ ,i]), i] <- "missing"
}
if (is.numeric(loan[,i]))
{
loan[is.na(loan[ ,i]), i] <- 9999
}
}
但是如果贷款数据集是一个tibble,上面的方法不能正常工作。字符(loan [,i])总是FALSE,而且.numeric(loan [,i])也是FALSE。数据集贷款的类别如下:
> class(loan)
[1] "tbl_df" "tbl" "data.frame"
要使用上述for-loop进行缺失值归档,我必须先转换贷款&#39;到as.data.frame()的数据框,然后使用for循环。
是否可以直接操作tibble而不先将其转换为data.frame来填充缺失值?
答案 0 :(得分:2)
我们可以使用tidyverse
语法来执行此操作
library(tidyverse)
loan %>%
mutate_if(is.character, funs(replace(., is.na(.), "missing"))) %>%
mutate_if(is.numeric, funs(replace(., is.na(.), 9999)))
# A tibble: 20 × 3
# Col1 Col2 Col3
# <chr> <dbl> <chr>
#1 a 9999 A
#2 a 2 A
#3 d 3 A
#4 c 9999 missing
#5 c 1 missing
#6 e 3 missing
#7 a 9999 A
#8 d 2 A
#9 d 3 A
#10 a 9999 A
#11 c 1 A
#12 b 1 C
#13 d 1 A
#14 d 9999 B
#15 a 4 B
#16 e 1 C
#17 a 3 A
#18 missing 3 A
#19 c 3 missing
#20 missing 4 missing
由于数据集为tibble
,因此不会通过vector
提取而转换为[
,而是需要[[
for (i in 1: ncol(loan)) {
if (is.character(loan[[i]])) {
loan[is.na(loan[[i]]), i] <- "missing"
} if (is.numeric(loan[[i]])) {
loan[is.na(loan[[i]]), i] <- 9999
}
}
要理解这个问题,我们只需要查看提取的输出
head(is.na(loan[,1]))
# Col1
#[1,] FALSE
#[2,] FALSE
#[3,] FALSE
#[4,] FALSE
#[5,] FALSE
#[6,] FALSE
head(is.na(loan[[1]]))
#[1] FALSE FALSE FALSE FALSE FALSE FALSE
在for
循环中,我们使用rowindex作为逻辑matrix
,在第一种情况下有1列,第二种情况是vector
,这使得差异< / p>
set.seed(24)
loan <- as_tibble(data.frame(Col1 = sample(c(NA, letters[1:5]), 20,
replace = TRUE), Col2 = sample(c(NA, 1:4), 20, replace = TRUE),
Col3 = sample(c(NA, LETTERS[1:3]), 20, replace = TRUE),
stringsAsFactors=FALSE))