检查特定主题是否在另一列中,并在r中的每列内重复

时间:2012-07-16 16:20:06

标签: r duplicates complete

这是数据: 示例1:完成

complete <- c("A", "B", "C","J", "C1", "L", "J2", "D", "M", "N")
lst1 <- c(NA, NA, NA, "A", "N", NA,"A", "C", "D", NA )
lst2 <- c(NA, NA, NA,"A", "L", NA, "C1", "J2", "J2", "B")
datf <- data.frame (complete, lst1, lst2, stringsAsFactors = FALSE)

示例2:不完整和重复

complete <- c("A", "B", "C","J", "C1", "L", "C", "D", "M", "N")
lst1 <- c(NA, NA, NA, "A", "N", NA,"A", "C", "D1", NA )
lst2 <- c(NA, NA, NA,"A", "L", NA, "C1", "J2", "J2", "B2")
datf2 <- data.frame (complete, lst1, lst2, stringsAsFactors = FALSE)

我想查看: (1)如果lst1和lst2的成员至少至少出现一次。   如果不存在,那么停止消息会说这个“?”存在于lst1或lst2(无论正确)但不完整。 我的试用版:     例如1

if (datf$lst1 %in%  datf$complete | datf$lst2 %in%  datf$complete) {
     stop ("the subject in lst1 or lst2 must be complete list ")} else {
     cat("I am fine")
     }

I am fineWarning message:
In if (datf$lst1 %in% datf$complete | datf$lst2 %in% datf$complete) { :
  the condition has length > 1 and only the first element will be used

为什么会出现此错误消息,我该怎么压制它?

  Example 2:
    if (datf2$lst1 %in%  datf2$complete | datf2$lst2 %in%  datf2$complete) {
         stop ("the subject in lst1 or lst2 must be complete list ")} else {
         cat("I am fine")
         }
   Although there is potential errors the error message is same:
      I am fineWarning message:
    In if (datf2$lst1 %in% datf2$complete | datf2$lst2 %in% datf2$complete) { :
      the condition has length > 1 and only the first element will be used

还有一种方法可以提供不匹配的名称作为错误消息的一部分。

(2)如果任何完整成员被公开。

修改

Expected answer:
Example1 <-  all members of lst1 and lst2 are also member of complete 

expacted message here is "I am fine"

Example2 <-
B2, J2, is member of lst2 but not complete, D1 is member of lst1 but not complete. 
complete have two C, so C is duplicated. 
The function will stop and print a message 

"B2 and J2 are member of lst1, but not in complete 
 D1  is member of lst2, but not in complete,
 check completeness" 
"C is duplicated in complete" 

1 个答案:

答案 0 :(得分:1)

> datf$lst1 %in% datf$complete | datf$lst2 %in% datf$complete
 [1] FALSE FALSE FALSE  TRUE  TRUE FALSE  TRUE  TRUE  TRUE  TRUE

来自?'if' if的参数是长度为1的逻辑向量,不是NA。

> na.omit(datf2$lst1)[!na.omit(datf2$lst1)%in%datf2$complete]
[1] "D1"
> na.omit(datf2$lst2)[!na.omit(datf2$lst2)%in%datf2$complete]
[1] "J2" "J2" "B2"

> datf2$complete[duplicated(datf2$complete)]
[1] "C"

以上内容应该可以帮助你构建一个功能来完成你的建议。