Question

我正在对存储在tibble中的数据进行一些数据清理，最后通过将一些空字符串观察转换为NA来反复混淆自己，然后当我调用summary(df)来检查时，观察结果似乎消失了我的工作。看来，使用tibble() NA时，只会报告非字符列。为什么是这样？这是故意的吗？如果是这样的话？

最小例子：

tdf <- tibble::tibble(a = c("apple", "pear", NA), 
                      b = 1:3, c = factor(letters[1:3]))
# We see that the NA in the 'chr' column is not displayed
summary(tdf) 
#>       a                   b       c    
#>  Length:3           Min.   :1.0   a:1  
#>  Class :character   1st Qu.:1.5   b:1  
#>  Mode  :character   Median :2.0   c:1  
#>                     Mean   :2.0        
#>                     3rd Qu.:2.5        
#>                     Max.   :3.0
# But NA in other column types will be
tdf[3, 2:3] <- NA
summary(tdf)
#>       a                   b           c    
#>  Length:3           Min.   :1.00   a   :1  
#>  Class :character   1st Qu.:1.25   b   :1  
#>  Mode  :character   Median :1.50   c   :0  
#>                     Mean   :1.50   NA's:1  
#>                     3rd Qu.:1.75           
#>                     Max.   :2.00           
#>                     NA's   :1

# This behavior is not the same with data.frame
ddf <- data.frame(a = c("apple", "pear", NA), 
                  b = 1:3, c = factor(letters[1:3]))
summary(ddf)
#>      a           b       c    
#>  apple:1   Min.   :1.0   a:1  
#>  pear :1   1st Qu.:1.5   b:1  
#>  NA's :1   Median :2.0   c:1  
#>            Mean   :2.0        
#>            3rd Qu.:2.5        
#>            Max.   :3.0
ddf[3, 2:3] <- NA
summary(ddf)
#>      a           b           c    
#>  apple:1   Min.   :1.00   a   :1  
#>  pear :1   1st Qu.:1.25   b   :1  
#>  NA's :1   Median :1.50   c   :0  
#>            Mean   :1.50   NA's:1  
#>            3rd Qu.:1.75           
#>            Max.   :2.00           
#>            NA's   :1

由reprex package（v0.2.0）创建于2018-03-01。

Answer 1

这是因为当您在data.frame中创建列'a'时，它们被定义为因子（请参阅stringsAsFactors）。在您的tibble中创建列时，它是一个字符类型列。

class(ddf$a)
"factor"

class(tdf$a)
"character"

如果使用stringsAsFactors = FALSE创建data.frame，您将看到它的行为类似于tibble。

ddf <- data.frame(a = c("apple", "pear", NA), 
              b = 1:3, c = factor(letters[1:3]), stringsAsFactors = FALSE)

class(ddf$a)
"character"

Answer 2

<强>为什么吗
可能是一个设计选择。

如何解决这个问题：
您可以使用参数lapply或table()的{{1}}和useNA= "always"：

"ifany"

您还可以在分组后使用tdf <- tibble::tibble(a = c("apple", "pear", NA, NA), b = 1:4, c = factor(letters[1:4]), d = c("apple", "pear", "peach", NA)) lapply(tdf, function(x){table(x, useNA = "always")}) # $a # x # apple pear <NA> # 1 1 2 # $b # x # 1 2 3 4 <NA> # 1 1 1 1 0 # $c # x # a b c d <NA> # 1 1 1 1 0 # $d # x # apple peach pear <NA> # 1 1 1 1检查单个列

dplyr::tally

为什么没有总结（tibble（））在chr列中报告NA？

2 个答案: