如何找到几列之间的差异

时间:2019-05-17 13:48:03

标签: r

我有这样的数据

df <- structure(list(X1 = c(37L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, NA, 
11L, 12L), X2 = c(40L, NA, 35L, 35L, 35L, 34L, 29L, 28L, 28L, 
NA, 25L, 24L), X3 = c(60L, 44L, 49L, 41L, NA, NA, NA, 25L, 26L, 
NA, NA, 22L), T1 = c(19L, 55L, 47L, 46L, 36L, 42L, 25L, NA, 33L, 
42L, 50L, 22L), T2 = c(75L, NA, 32L, 44L, 27L, 31L, 17L, NA, 
18L, 45L, 10L, 11L), T3 = c(5L, 6L, 7L, 8L, 9L, 10L, 11L, NA, 
46L, 36L, 42L, NA)), class = "data.frame", row.names = c(NA, 
-12L))

我希望能够获得以下信息

Number_of_values_X1-X3  11
Number_of_missing_in_X1 1
Number_of_missing_in_X2 2
Number_of_missing_in_X3 5
Number_of_missing_in_X1X2X3 1
Number_of_Missing_in_X1_X2  0
Number_of_missing_in_X1_X3  0
Number_of_missing_in_X2_X3  0

其他三列相同

Number_of_values _T1-T3 11
Number_of_missing_in_T1 1
Number_of_missing_in_T2 2
Number_of_missing_in_T3 2
Number_of_missing_in_X1X2X3 1
Number_of_Missing_in_X1_X2  0
Number_of_missing_in_X1_X3  0
Number_of_missing_in_X2_X3  0

我尝试使用以下功能来执行此操作,但是当涉及更多列时,我不知道如何修改此内容

myData <- function(Ecol) {
  N_V <- length(setdiff(df[[column]], NA))
  N_Missing <- sum(is.na(df[[column]]))
  print(paste("Number of values in", column, N_V))
  print(paste("Number of missing in", column, N_Missing))
}

1 个答案:

答案 0 :(得分:0)

以下内容将给出每列NA个值的总数和所选列每行NA个值的数量。
首先定义一个函数并计算一个逻辑矩阵。

na_cols <- function(X, cols){
  all_na <- apply(X[, cols], 1, function(y) Reduce('&', y))
  sum(all_na)
}

na <- sapply(df, is.na)

现在每列共有NA个值。

apply(na[, 1:3], 2, sum)
#X1 X2 X3 
# 1  2  5 

以及每个列组合的NA个值的数量。

na_cols(na, 1:3)
#[1] 1

na_cols(na, 1:2)
#[1] 1

na_cols(na, c(1, 3))
#[1] 1

na_cols(na, 2:3)
#[1] 1

对于以T开头的列,等效代码为

apply(na[, 4:6], 2, sum)
na_cols(na, 4:6)
na_cols(na, 4:5)
na_cols(na, c(4, 6))
na_cols(na, 5:6)