我希望在获取列数和其名称后过滤掉至少有一个缺失值的列。 我使用此函数来获取仅包含缺失值的列的名称和数量,而不是从数据框中过滤它们:
CheckColAllNulls <- if (ncol(Filter(function(x) all(is.na(x)), df)) > 0) {
cat("columns have only nulls:",ncol(Filter( function(x) all(is.na(x)), df)))
cat("columns names that have only nulls:",colnames(Filter( function(x) all(is.na(x)), df)))
df <- Filter(function(x) ! all(is.na(x)), df)
print("columns having only nulls removed ")
} else {
print("No columns having only nulls are found")
}
我尝试执行相同的操作,但对于使用colSums
至少有一个缺失值的列,但没有成功。
CheckColNulls <- if ( colSums(is.na(df)) > 0) {
cat("columns have more than one null:",ncol(colSums(is.na(df)) > 0 ))
cat("columns names that have more than one null:",colnames(colSums(is.na(df)) > 0 ))
df <- Filter(function(x) colSums(is.na(x)) > 0), df)
print("columns having at least one null removed ")
} else {
print("No columns having at least one null are found")
}
这是我得到的错误:
Error in colSums(is.na(x)) :
'x' must be an array of at least two dimensions
In addition: Warning message:
In if (colSums(is.na(df)) > 0) { :
the condition has length > 1 and only the first element will be used
答案 0 :(得分:0)
以下是我使用any(is.na(x))
的解决方案:
CheckColAllNulls <- if (ncol(Filter(function(x) any(is.na(x)), df)) > 0) {
cat("columns have only nulls:",ncol(Filter( function(x) any(is.na(x)), df)))
cat("\n\n")
cat("columns names that have only nulls:",colnames(Filter( function(x) any(is.na(x)), df)))
cat("\n\n")
df <- Filter(function(x) ! any(is.na(x)), df)
print("columns having any nulls are removed")
} else {
print("No columns having any nulls are found")
}