我创建以下数据框:
df <- data.frame(seq(from = as.Date("2001-01-01"), to = as.Date("2001-12-31"), by = 1),
seq(1,365), seq(1, 365), seq(1, 365), seq(1, 365))
colnames(df) <- c("date", "C1", "C2", "C3", "C4")
df$C1[50:100] <- NA
df$C2[20:80] <- NA
df$C3[70:150] <- NA
df$C4[250:300] <- NA
我想计算每月缺失值的百分比,不仅是每一列,而且是整个数据集。
有有效的方法吗?
答案 0 :(得分:3)
library(dplyr)
library(lubridate)
#is.na(.) can be more specific e.g. is.na(.[,2:5]) OR is.na(.[,grepl("C",colnames(df))])
df %>% mutate(Month=month(date), Mis = rowSums(is.na(.))) %>%
group_by(Month) %>%
summarise(Sum=sum(Mis), Percentage=mean(Mis))