在tibble中填写NA值而不将其转换为data.frame

时间:2017-04-30 13:38:41

标签: r

在数据框贷款中填写NA值的一种常规方法如下:

for (i in 1: ncol(loan))
  {
   if (is.character(loan[,i]))
    {
      loan[is.na(loan[ ,i]), i] <- "missing"
    }
  if (is.numeric(loan[,i]))
   {
     loan[is.na(loan[ ,i]), i] <- 9999
   }
}

但是如果贷款数据集是一个tibble,上面的方法不能正常工作。字符(loan [,i])总是FALSE,而且.numeric(loan [,i])也是FALSE。数据集贷款的类别如下:

> class(loan)
[1] "tbl_df"     "tbl"        "data.frame"

要使用上述for-loop进行缺失值归档,我必须先转换贷款&#39;到as.data.frame()的数据框,然后使用for循环。

是否可以直接操作tibble而不先将其转换为data.frame来填充缺失值?

1 个答案:

答案 0 :(得分:2)

我们可以使用tidyverse语法来执行此操作

library(tidyverse) 
loan %>% 
    mutate_if(is.character, funs(replace(., is.na(.), "missing"))) %>% 
    mutate_if(is.numeric, funs(replace(., is.na(.), 9999)))
# A tibble: 20 × 3
#      Col1  Col2    Col3
#     <chr> <dbl>   <chr>
#1        a  9999       A
#2        a     2       A
#3        d     3       A
#4        c  9999 missing
#5        c     1 missing
#6        e     3 missing
#7        a  9999       A
#8        d     2       A
#9        d     3       A
#10       a  9999       A
#11       c     1       A
#12       b     1       C
#13       d     1       A
#14       d  9999       B
#15       a     4       B
#16       e     1       C
#17       a     3       A
#18 missing     3       A
#19       c     3 missing
#20 missing     4 missing

由于数据集为tibble,因此不会通过vector提取而转换为[,而是需要[[

for (i in 1: ncol(loan))  {
  if (is.character(loan[[i]])) {
  loan[is.na(loan[[i]]), i] <- "missing"
   }  if (is.numeric(loan[[i]]))     {
   loan[is.na(loan[[i]]), i] <- 9999
    }
  }

要理解这个问题,我们只需要查看提取的输出

head(is.na(loan[,1]))
#      Col1
#[1,] FALSE
#[2,] FALSE
#[3,] FALSE
#[4,] FALSE
#[5,] FALSE
#[6,] FALSE

head(is.na(loan[[1]]))
#[1] FALSE FALSE FALSE FALSE FALSE FALSE

for循环中,我们使用rowindex作为逻辑matrix,在第一种情况下有1列,第二种情况是vector,这使得差异< / p>

数据

set.seed(24)
loan <- as_tibble(data.frame(Col1 = sample(c(NA, letters[1:5]), 20, 
   replace = TRUE), Col2 = sample(c(NA, 1:4), 20, replace = TRUE),
          Col3 = sample(c(NA, LETTERS[1:3]), 20, replace = TRUE), 
         stringsAsFactors=FALSE))