我正在尝试基于同一行的多列之间是否存在直接字符匹配来创建新列。然后,如果这些列之间有完全匹配项,那么我想在新列中放入1;如果各列之间至少有1个不匹配项,那么我想将其放入0。这是数据的示例:
ID var1 var2 var3
1 1 abc def abc
2 2 def xyz jkl
3 3 ghi abc abc
4 4 jkl jkl jkl
5 5 jkl jkl NA
6 6 abc NA NA
...
最终数据应如下所示
ID var1 var2 var3 var_match
1 1 abc def abc 0
2 2 def xyz jkl 0
3 3 ghi abc abc 0
4 4 jkl jkl jkl 1
5 5 jkl jkl NA 1
6 6 abc NA NA NA
...
我尝试了以下代码:
df$var_match <-0
df <- within(df, { var_match<- ifelse(var1 == var2 & var1== var3, 1, 0) })
但是这不适用于NA(如第5行所示)-结果是给我NA,而不是所需的1。请告诉我是否有解决方法。预先感谢!
答案 0 :(得分:0)
一个选项是
i1 <- df$var1 == df[3:4]
df$var_match <- as.integer(!rowSums(!i1, na.rm = TRUE) *
NA^(rowSums(is.na(i1)) == 2))
df$var_match
#[1] 0 0 0 1 1 NA
df <- structure(list(ID = 1:6, var1 = c("abc", "def", "ghi", "jkl",
"jkl", "abc"), var2 = c("def", "xyz", "abc", "jkl", "jkl", NA
), var3 = c("abc", "jkl", "abc", "jkl", NA, NA)),
class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6"))
答案 1 :(得分:0)
在基数R中,您可以执行以下操作...
df$var_match <- as.integer( #convert to 1/0 from TRUE/FALSE
apply(df[, -1], #run through df, excluding col 1
1, #by rows
function(x) {length(unique(x[!is.na(x)])) == 1 #test for one distinct value
& sum(!is.na(x)) > 1})) #but more than one non-NA