一次检查多个数据框列(灵活方式)

时间:2016-09-17 09:40:37

标签: r dataframe vectorization apply

寻找更好的方法:如何让R在元素方面检查多列灵活子集的值(在这里说entrySetVar2)并写入检查结果到一个新的逻辑列?

在这里使用行式Var3是否有更短,更优雅的方式?

apply()

我也可以用明确的方式做到这一点,但这不是灵活的:

df <- read.csv(
  text = '"Var1","Var2","Var3"
  "","",""
  "","","a"
  "","a",""
  "a","a","a"
  "a","","a"
  "","a",""
  "","",""
  "","","a"
  "","a",""
  "","","a"'
)

criticalColumns <- c("Var2", "Var3")

df$criticalColumnsAreEmpty <-
  apply(df[, criticalColumns], 1, function(curRow) {
    return(all(curRow == ""))
  })

期望的输出:

df$criticalColumnsAreEmpty <- df$Var2 == "" & df$Var3 == ""

1 个答案:

答案 0 :(得分:1)

我们可以在逻辑矩阵上使用rowSums

df$criticalColumnsAreEmpty <- !rowSums(df[criticalColumns]!="")
df$criticalColumnsAreEmpty
#[1]  TRUE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE

或者另一个选项(对于大数据集,以避免因内存原因而转换为矩阵)循环遍历列,检查元素是否为空并使用Reduce&

Reduce(`&`, lapply(df[criticalColumns], function(x) !nzchar(as.character(x))))