转置数据帧变量并在[r]中添加空的唯一计数

时间:2017-06-17 02:44:53

标签: r dplyr tidyverse

我正在尝试构建一个数据框的摘要表,如下面的DataProfile。 我们的想法是将每一列转换为一行,并为count,nulls,not nulls,unique添加变量,并添加这些变量的其他变异。

似乎应该有更好的更快的方法来做到这一点。有没有这样做的功能?

#trying to write the functions within dplyr & magrittr framework
library(tidyverse)

mtcars[2,2] <- NA # Add a null to test completeness

# 
total <- mtcars %>% summarise_all(funs(n())) %>% melt
nulls <- mtcars %>% summarise_all(funs(sum(is.na(.)))) %>% melt
filled <- mtcars  %>% summarise_all(funs(sum(!is.na(.)))) %>% melt
uniques <- mtcars %>% summarise_all(funs(length(unique(.)))) %>% melt


mtcars %>% summarise_all(funs(n_distinct(.))) %>% melt


#Build a Data Frame from names of mtcars and add variables with mutate
DataProfile <- as.data.frame(names(mtcars))
DataProfile <- DataProfile %>% mutate(Total = total$value,
                       Nulls = nulls$value,
                       Filled = filled $value,
                       Complete = Filled/Total,
                       Cardinality = uniques$value,
                       Uniqueness = Cardinality/Total,
                       Distinctness = Cardinality/Filled)
DataProfile

#These are other attempts with Base R, but they are harder to read and don't play well with summarise_all
sapply(mtcars, function(x) length(unique(x[!is.na(x)]))) %>% melt
rapply(mtcars,function(x)length(unique(x))) %>% melt

1 个答案:

答案 0 :(得分:2)

summarise_all()函数一次可以处理多个函数,因此您可以通过一次传递来合并代码,然后格式化数据以获得&#34; profile&#34;每个你想要的变量。

library(tidyverse)

mtcars[2,2] <- NA # Add a null to test completeness

DataProfile <- mtcars %>% 
  summarise_all(funs("Total" = n(), 
                     "Nulls" = sum(is.na(.)), 
                     "Filled" = sum(!is.na(.)), 
                     "Cardinality" = length(unique(.)))) %>% 
  melt() %>%
  separate(variable, into = c('variable', 'measure'), sep="_") %>%
  spread(measure, value)  %>%
  mutate(Complete = Filled/Total,
         Uniqueness = Cardinality/Total,
         Distinctness = Cardinality/Filled)

DataProfile