计算数据框中具有特定值的列

时间:2019-06-03 14:15:43

标签: r count

我有以下称为temp的可复制数据集:

temp=as.data.frame(cbind(c("x3","x2","x1",NA),c("x5","x2","x1",NA),c("x2","x3","x1",NA),c("x3","x2","x1","x4"),c("x1","x2",NA,NA)))

我想计算c("x3","x2","x1")列及其所有可能的变量(例如c("x1","x2","x3"))处于临时状态的次数。因此,它应提供输出[2]sum(sapply(temp, function(x) all(x[!is.na(x)] %in% c("x1","x2","x3"))))很遗憾没有提供正确的解决方案。 如何计算其中包含某些值的列数及其所有变化?

2 个答案:

答案 0 :(得分:1)

您的代表:

temp <- as.data.frame(
  cbind(
    c("x3", "x2", "x1",  NA ),
    c("x5", "x2", "x1",  NA ),
    c("x2", "x3", "x1",  NA ),
    c("x3", "x2", "x1", "x4"),
    c("x1", "x2",  NA ,  NA )
  )
)
target <- c("x3", "x2", "x1")

然后,如果您要检查该列仅包含那三个级别:

sum(sapply(temp, function(x) setequal(target, levels(x))))

setequal()检查两个集合是否相等(与顺序无关)。 levels(因为您没有设置stringsAsFactors = FALSE会告诉您该列中的全部内容。

这将执行相同的操作:

sum(sapply(temp, function(x) setequal(target, na.omit(x))))

如果要检查每个元素出现的次数相同,请尝试identical()as.character()一起将向量转换为字符。

sum(sapply(temp, function(x) {
  identical(sort(target), sort(as.character(na.omit(x))))
}))

(或者只需在原始数据集中设置stringsAsFactors = FALSE,您就不必在这里使用as.character()。)

答案 1 :(得分:0)

这应该有效。追踪一下唯一值是否相同:

  data <- as.data.frame(cbind(c("x3","x2","x1",NA),c("x5","x2","x1",NA),c("x2","x3","x1",NA),c("x3","x2","x1","x4"),c("x1","x2",NA,NA)))
  vector_pattern <- c("x3","x2","x1")

  nvect <- length(vector_pattern)
  cont <- 0
  for(i in 1:ncol(data)){
    aa <- unique(data[,i])
    aa <- aa[!is.na(aa)]


    if(all(!is.na(match(aa,vector_pattern))) & length(aa) == nvect){
      cont <- cont + 1
    }
  }
  print(cont)