我有这样的数据
df <- structure(list(X1 = c(37L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, NA,
11L, 12L), X2 = c(40L, NA, 35L, 35L, 35L, 34L, 29L, 28L, 28L,
NA, 25L, 24L), X3 = c(60L, 44L, 49L, 41L, NA, NA, NA, 25L, 26L,
NA, NA, 22L), T1 = c(19L, 55L, 47L, 46L, 36L, 42L, 25L, NA, 33L,
42L, 50L, 22L), T2 = c(75L, NA, 32L, 44L, 27L, 31L, 17L, NA,
18L, 45L, 10L, 11L), T3 = c(5L, 6L, 7L, 8L, 9L, 10L, 11L, NA,
46L, 36L, 42L, NA)), class = "data.frame", row.names = c(NA,
-12L))
我希望能够获得以下信息
Number_of_values_X1-X3 11
Number_of_missing_in_X1 1
Number_of_missing_in_X2 2
Number_of_missing_in_X3 5
Number_of_missing_in_X1X2X3 1
Number_of_Missing_in_X1_X2 0
Number_of_missing_in_X1_X3 0
Number_of_missing_in_X2_X3 0
其他三列相同
Number_of_values _T1-T3 11
Number_of_missing_in_T1 1
Number_of_missing_in_T2 2
Number_of_missing_in_T3 2
Number_of_missing_in_X1X2X3 1
Number_of_Missing_in_X1_X2 0
Number_of_missing_in_X1_X3 0
Number_of_missing_in_X2_X3 0
我尝试使用以下功能来执行此操作,但是当涉及更多列时,我不知道如何修改此内容
myData <- function(Ecol) {
N_V <- length(setdiff(df[[column]], NA))
N_Missing <- sum(is.na(df[[column]]))
print(paste("Number of values in", column, N_V))
print(paste("Number of missing in", column, N_Missing))
}
答案 0 :(得分:0)
以下内容将给出每列NA
个值的总数和所选列每行NA
个值的数量。
首先定义一个函数并计算一个逻辑矩阵。
na_cols <- function(X, cols){
all_na <- apply(X[, cols], 1, function(y) Reduce('&', y))
sum(all_na)
}
na <- sapply(df, is.na)
现在每列共有NA
个值。
apply(na[, 1:3], 2, sum)
#X1 X2 X3
# 1 2 5
以及每个列组合的NA
个值的数量。
na_cols(na, 1:3)
#[1] 1
na_cols(na, 1:2)
#[1] 1
na_cols(na, c(1, 3))
#[1] 1
na_cols(na, 2:3)
#[1] 1
对于以T
开头的列,等效代码为
apply(na[, 4:6], 2, sum)
na_cols(na, 4:6)
na_cols(na, 4:5)
na_cols(na, c(4, 6))
na_cols(na, 5:6)