Summarise_if对所有变量名进行计数

时间:2018-09-03 12:36:47

标签: r dplyr

我正在尝试为所有列按类变量分组获取meansumcount,但对于计数-n()(第三条语句),我得到了错误

  

错误:不应直接调用此函数

Class <- c("A","A","A","A","B","B","B","C","C","C","C","C","C")
A<-c(23,33,NA,56,22,34,34,45,65,5,57,75,57)
D<-c(2,133,5,60,23,312,341,25,75,NA,3,9,21)
M<-c(34,35,67,325,46,56,547,47,67,67,68,3,12)

df <- data.frame(Class,A,D,M)
library(dplyr)

system.time(df_sum <- df %>% group_by(Class) %>% summarise_if(is.numeric, sum , na.rm=T))
system.time(df_mean <- df %>% group_by(Class) %>% summarise_if(is.numeric, mean , na.rm=T))

system.time(df_count <- df %>% group_by(Class) %>% summarise_if(is.numeric, n() , na.rm=T))

请建议我上述声明所需的任何修改。

1 个答案:

答案 0 :(得分:3)

要获取每个数字列中非NA值的数量,可以使用:

library(dplyr)

df %>%
  group_by(Class) %>%
  summarise_if(is.numeric,
               function(x) sum(!is.na(x)))

#output
# A tibble: 3 x 4
  Class     A     D     M
  <fct> <int> <int> <int>
1 A         3     4     4
2 B         3     3     3
3 C         6     5     6

n()函数几乎没有灵活性,并且没有na.rm参数