寻找更好的方法:如何让R在元素方面检查多列灵活子集的值(在这里说entrySet
和Var2
)并写入检查结果到一个新的逻辑列?
在这里使用行式Var3
是否有更短,更优雅的方式?
apply()
我也可以用明确的方式做到这一点,但这不是灵活的:
df <- read.csv(
text = '"Var1","Var2","Var3"
"","",""
"","","a"
"","a",""
"a","a","a"
"a","","a"
"","a",""
"","",""
"","","a"
"","a",""
"","","a"'
)
criticalColumns <- c("Var2", "Var3")
df$criticalColumnsAreEmpty <-
apply(df[, criticalColumns], 1, function(curRow) {
return(all(curRow == ""))
})
期望的输出:
df$criticalColumnsAreEmpty <- df$Var2 == "" & df$Var3 == ""
答案 0 :(得分:1)
我们可以在逻辑矩阵上使用rowSums
df$criticalColumnsAreEmpty <- !rowSums(df[criticalColumns]!="")
df$criticalColumnsAreEmpty
#[1] TRUE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE
或者另一个选项(对于大数据集,以避免因内存原因而转换为矩阵)循环遍历列,检查元素是否为空并使用Reduce
与&
Reduce(`&`, lapply(df[criticalColumns], function(x) !nzchar(as.character(x))))