如何获取至少具有一个空值的列的数量和名称

时间:2016-01-27 10:00:31

标签: r dataframe

我希望在获取列数和其名称后过滤掉至少有一个缺失值的列。 我使用此函数来获取仅包含缺失值的列的名称和数量,而不是从数据框中过滤它们:

CheckColAllNulls <- if (ncol(Filter(function(x) all(is.na(x)), df)) > 0) {
  cat("columns have only nulls:",ncol(Filter( function(x) all(is.na(x)), df)))
  cat("columns names that have only nulls:",colnames(Filter( function(x) all(is.na(x)), df)))
  df <- Filter(function(x) ! all(is.na(x)), df)
  print("columns having only nulls removed ")
} else {
  print("No columns having only nulls are found")
}

我尝试执行相同的操作,但对于使用colSums至少有一个缺失值的列,但没有成功。

CheckColNulls <- if ( colSums(is.na(df)) > 0) {
  cat("columns have more than one null:",ncol(colSums(is.na(df)) > 0 ))
  cat("columns names that have more than one null:",colnames(colSums(is.na(df)) > 0 ))
  df <- Filter(function(x)  colSums(is.na(x)) > 0), df)
  print("columns having at least one null removed ")
} else {
  print("No columns having at least one null are found")
}

这是我得到的错误:

Error in colSums(is.na(x)) : 
  'x' must be an array of at least two dimensions
In addition: Warning message:
In if (colSums(is.na(df)) > 0) { :
  the condition has length > 1 and only the first element will be used

1 个答案:

答案 0 :(得分:0)

以下是我使用any(is.na(x))的解决方案:

CheckColAllNulls <- if (ncol(Filter(function(x) any(is.na(x)), df)) > 0) {
    cat("columns have only nulls:",ncol(Filter( function(x) any(is.na(x)), df)))
    cat("\n\n")
    cat("columns names that have only nulls:",colnames(Filter( function(x) any(is.na(x)), df)))
    cat("\n\n")
    df <- Filter(function(x) ! any(is.na(x)), df)
     print("columns having any nulls are removed")
} else {
    print("No columns having any nulls are found")
}